This is part of a series on preparing your Rails application for Black Friday. This disaster recovery plan is useful for sites that need to recover quickly when an entire AWS region goes down.
First of all, this isn't the only way to set up disaster recovery. We've used this with success for Engine Yard customers who need to fail over to a different AWS region in a timely manner.
Engine Yard runs your application on an AWS region of your choice. AWS currently operates in 16 regions with 6 more on the way. Each region is completely independent and resources are not replicated across regions. A region consists of multiple Availability Zones (AZ). Each AZ is isolated; you can think of it as a data center. AZs within a region are connected through low-latency links.
Engine Yard uses instances on multiple AZs. Since these AZs are isolated, your app keeps on running in case one AZ experiences an issue. There are a few exceptions like when your database runs on the problematic AZ. We recommend running a database replica on a different AZ for exactly this issue. A database replica can be promoted in a matter of minutes.
By default, your application hosted at Engine Yard is already highly available because of the use of multiple AZs. Each AZ is a separate data center (in some cases multiple data centers) so an issue in one AZ usually doesn't affect another.
There are rare cases when an issue occurs region-wide. Running on multiple AZs doesn't help you here. It is this scenario where you want a disaster recovery environment in a different region. This adds cost and complexity to your setup. Engine Yard aims to make it easier for you in both areas.
Let's say you have a Production environment running in N. Virginia (US East 1) and you want a DR environment in Oregon (US West 2). These 2 environments look the same in the beginning. If you have 2 app instances, 1 database master, 1 database replica on the Production environment, you'll have the same number of instances on DR.
To connect these 2 environments, we'll set up the DR database master as a replica of the Production database master. It might sound confusing at first. A database master that is also a replica. This commonly used setup is called chain replication. It is the heart of our disaster recovery plan. Your app writes data to the Production environment in N. Virginia which gets replicated to the DR environment in Oregon.
In the event of a region-wide issue in N. Virginia, you can point your website to the DR environment and continue serving traffic. I make it sound simple but we all know the devil is in the details.
Disaster Recovery Plan: Chef Recipe
We use a custom chef recipe to set up replication and to failover to the DR environment. To use the recipe, you or Support needs to generate a new SSH keypair and add the public key to Engine Yard.
Configure the attributes in the dr_replication cookbook. You'll need the public hostnames of the database masters on the Production and DR environments. Check the README of the repository for more details.
The chef recipe does the following to set up replication:
Add the SSH keypair to the database instances on the Production and DR environments.
Start an SSH tunnel from the DR database master to the Production database master.
Create a backup of the Production database master and load it on the DR database master using XtraBackup.
MASTERof the DR database master to the Production database master. In this setup, the DR database master is a replica of the Production database master.
Set the DR database master to read-only.
For PostgreSQL, replication is set up by copying files from the PostgreSQL
Failover to the DR Environment
To failover to the DR environment, the chef recipe does the following:
Remove replication files from the DR database master.
Disable read-only on the DR database master so it can accept writes.
Stop the slave thread and restart the MySQL process.
Things to Consider
Database replication and failover are essential to your disaster recovery plan but they're not the only things you have to think about. Here are some things you should also be aware of:
The DR environment will cost money even if they don't serve requests. You can stop a few instances to save some money. You don't need the DR replica yet until you failover. You can have a few app instances on stand by and start more when you failover.
Replicate other data stores. If you're using Redis or other databases, you should setup replication and have a recovery plan.
Stop some processes. If you have background jobs workers, you don't want them to run on the DR environment yet.
Test your failover process regularly. You don't want your first time to fail over to your DR environment during an actual outage. Do dry runs so you can work out any problems with the failover process.
Engine Yard Can Help You
Do you want us to help you set up a disaster recovery environment? Whether you're already on AWS or not, we can help make sure you recover quickly in case of region-wide problems. Contact us for a free consultation.