Best Practices for Disaster Recovery
Facebook
Twitter
LinkedIn
Those unfamiliar with PaaS options may at times ask, “what’s the true benefit of using a Platform as a Service?” They elaborate by saying, “heck, I can install Ruby (or Node.js, PHP, MySQL, PostgreSQL, etc.), deploy my application and monitor the systems myself!” This is definitely true. There are thousands of companies doing their own DevOps today and that pattern works for them.
Where a PaaS like Engine Yard really shines is when a company doesn’t have the developer resources, in-house expertise, or contractor budget to properly manage their production infrastructure. A PaaS allows a development team of any size to focus on their application instead of their infrastructure, thus making them more productive and providing more “bang for the buck” with development dollars spent. Which would you rather do as a developer: write code, or get tied down in several days of yak shaving while building a new production cluster? And what if that cluster has a hardware failure at 4AM—how do you feel about being “on call”?
We are going to lay out some explanation on what it takes to build, monitor, support and manage, a production web application cluster at medium scale, and then contrast that with equivalent steps when using Engine Yard PaaS.
Running your own production setup on your own has its place and its merits. Developers can learn so much by running their own production cluster, and those lessons will help them write more stable and efficient applications. However, as with all things, there are tradeoffs of time and effort (and therefore, budget) to consider, and in those cases a PaaS may very well be a better option, especially for small teams. If you’re on the fence, not sure which way to go, this ebook aims to illuminate the differences in control, time and cost to help you make an informed decision that best benefits your team, your client(s), your project and your users long-term.
Let’s start with the basics. In most modern applications, you’ll want to use a cloud hosting provider due to the cost savings and disposability of virtual machines. At Engine Yard, we use Amazon EC2 for our underlying infrastructure. You have plenty of choices out there including EC2, Rackspace Cloud, and HP Cloud. Each have their pros and cons. For the purposes of this ebook, we’re going to compare a “DIY” setup with Engine Yard.
Some organizations and/or applications may be better suited to running bare metal in a co-located data center. Remember that you’ll likely have some complex contracts and logistics come up in the process of putting hardware in that data center. A fully managed solution will be simpler to have set up, but you won’t own the hardware and such solutions can be rather expensive, also at times involving contractual obligations.
After deciding where your application will be hosted and on what type of platform, you would need to make a choice as to your Linux distribution. Our general advice would be to use whatever Linux distribution your team feels most comfortable with and that has the best overall community and/or commercial support.
Once you’ve made the above basic decisions and assessed your traffic expectations, you can start building out an initial production cluster. For the purposes of this ebook, we’re going to assume you’ve decided to run a Rails app on Amazon EC2, or a similar cloud service, directly. As you can see from the following comparison, you have complete and total control over the individual specific functions of setting up a cluster on your own, but at a significant trade off: time and effort. The amount of work we’ve seen put into standing up a cluster in some cases can range from a day to a week, depending on the complexity involved and how many surprises get thrown your way.
With Engine Yard, or any properly built PaaS frankly, that “day to week” timeframe is shrunk down to a matter of minutes. The process involved in the Engine Yard section can be completed in under an hour in most cases (issues of loading existing database and DNS propagation withstanding). However, there is a trade off here too: flexibility. When using any form of automation, by the very nature of the beast, you’ll have less flexibility. Which can prompt another question: “How can I get that flexibility back, and how much flexibility do I need?”
On Engine Yard, we solve this flexibility problem with Chef. An automated configuration tool written in Ruby, Chef is designed to be easy to learn and to help you automate and standardize system configuration by writing code. Since virtual machines are treated as disposable commodities, you need configuration automation to make certain that all nodes in any given cluster are identical and that their configurations are repeatable in the future, even when being stood up from scratch from an empty data volume.
Using custom Chef recipes, you can exercise nearly 100% total control over your cluster configuration. Once written, properly tested and in use, custom Chef recipes allow you to automatically configure any aspect of any of your clustered instances at boot time, meaning that you can trash one instance and spin up another one and not have to worry about configuration at all. It just happens automatically when you have custom Chef recipes uploaded for a given environment.
Any application that’s built to be successful is eventually going to need to scale. In this example we’re starting out with the most basic cluster we realistically can operate in production: one application instance and one database master instance. At some point however, you’re going to need to add more application instances to handle load from end users. Additionally, you may run into performance issues using a single database master for both reads and writes. At scale, it just won’t keep up and will take forever to return query results to your application instances, so you’ll need to horizontally scale your database as well.
Let’s start with a discussion on scaling the application tier. You’ll need to duplicate your existing application server, which is easy if the entire thing is EBS-backed on EC2. However, having an entire instance based on EBS can make for slow performance, so you may not have opted for that route. Either way, you need a way to quickly and easily make another of those servers.
Once that’s done, you also need a way to tell incoming traffic to be load balanced between those servers. You can do this with a virtual load balancing appliance such as an Elastic Load Balancer, or with software, such as haproxy.
Next, you have to deploy your code to the second (or third, fourth, fifth, and so on) application server in your cluster and verify that it’s been set up correctly, is secured, has been added to the load balancer pool, and is actually serving up the right code.
Finally, you have to alter your deployment process to push your new code to all application servers in your cluster, and devise a strategy for doing it with minimal downtime.
If you have an application that takes uploaded assets and does something with them, please note: Moving from a single application server to a multiple application server cluster is going to cause you pain no matter how you do it. The reason being that a request will go to one application server with the uploaded data (say, a forum avatar for example) but not the other(s). Then it gets saved on disk on one of the application instances, but not on the others, meaning that future requests to /path/to/wherever/your/static/assets/are/avatar.png returns a 404 on all servers except the one that processed the original upload.
This is why you really need to upload your assets to Amazon S3 or a similar storage service (Rackspace Cloud Files for example). Many gems support this pattern already, and for non-Ruby users, I’m sure there are libraries and/or examples on how to achieve similar results in your language of choice.
Now that you’ve added your application instances, you’ll want to scale out your database tier with one or more replicas. To do this, your application first should be configured via whatever gem or library you choose to perform reads from replicas and writes to the master. This applies primarily to standard SQL databases (MySQL, PostgreSQL).
Let’s start by examining what you’d have to do to add a single database slave to a MySQL cluster. PostgreSQL is discussed on the next page.
When establishing MySQL replication, recording the position and filename of the binlog on the database master is of paramount importance because that’s how your slave knows where to start replicating from. In situations where an rsync is required to move data across (as opposed to a snapshot, which can include any changes that occur during its execution in the snapshot itself ) from master to slave, you’ll need to keep the database locked until the rsync is complete to avoid possible data loss or other issues.
Setting up replication with PostgreSQL is different—at least from the “DIY” standpoint. There are multiple replication strategies that can be used for different reasons and architectures, and you’d first have to assess which is best for your application and cluster configuration. Once that’s done, you would need to create new PostgreSQL replica servers, rsync files from the master to the replica, make configuration changes and edits on both, and then restart the servers. The PostgreSQL wiki has a great getting started tutorial here. Note that on Engine Yard, all this is handled for you through the exact same interface as mentioned for MySQL above: just “add to cluster” and you’re done.
Additionally, being alerted to problems with replication is also a key factor. On Engine Yard this is already monitored for you; on your own, you would need to enable monitoring of replication and the overall health of both databases on your own.
Having a database replica or two in your cluster is always a good practice to help you quickly recover from a database-impacting event by promoting a replica to master status, but that shouldn’t be the complete sum of your database backup strategy. You should regularly snapshot and/or perform SQL dumps of your data and store them.
Any sufficiently sized application will result in a multitude of logs being created, and without proper log management in place, can fill a disk rather quickly.
The DIY sysadmin’s job is still not done. Once your cluster is built and running the way you want it, you’ll need to implement a monitoring system of some form.
There are two basic “observational vectors” for monitoring. The first is what we’ll call “internal”— monitoring that’s internal to the system, running on a host that may warn you if memory and swap become dangerously low, or if CPU usage spikes.
The second observational vector we consider to be “external”— e.g. a site uptime monitor. You need to know if your site goes down and you’ll need a separate service, such as SiteUptime or Pingdom, or a multitude of others, to be able to alert you if that happens.
Setting up external monitoring is quite a bit easier as it usually involves purchasing a monthly service from a third party, such as pingdom or siteuptime.com. However, depending on your choices, you may have additional fees vs. what’s already built into the Engine Yard platform.
In addition to the monitoring available to you by default with Engine Yard, you can use the AppFirst and/or New Relic addons to obtain metrics about your application(s) and servers. Enabling these on Engine Yard takes a matter of minutes, whereas doing so by hand may take hours to days, depending on how billing accounts need to be set up and departmental approval that may be needed.
Every production system at one point or another experiences an unexpected event. Hardware failure is far from unheard of on systems like Amazon EC2, weather events can impact operations, user load can spike unexpectedly, a multitude of security vulnerabilities could be published or exploited, and so on. Unfortunately, these things have a tendency to happen at the worst possible times. Murphy’s law applies: the least opportune time for a system to go down is when it will. So the question then becomes: how fast can you respond?
There are two possible ways these situations are handled with Engine Yard.
Standard support customers:
Premium support customers:
In summary, running your own application cluster at scale can be a very time consuming process to get set up properly. Even when “finished,” one still has to manually respond to issues and events, apply security patches and upgrades, and keep software versions up to date. A Platform as a Service offering like Engine Yard automates the vast majority of that process without sacrificing flexibility or control, allowing a development team to focus purely on their application.
At the end of the day, for any small to medium sized business, and for departments within the enterprise, it’s about budget and money. You’re going to spend money on hardware (physical or virtual) one way or another. The question therefore becomes, is the cost of a PaaS less than, or greater than, that of a systems administrator, and do you necessarily need a full time systems administrator for your application? In some cases you absolutely will need a dedicated systems administrator, but in most cases, a PaaS can provide agility, capabilities, and access to expertise otherwise not attainable at such a low cost.
Is deploying, running, and managing your app taking away precious resources? Engine Yard takes the operational overhead out of the equation, so you can keep innovating.
14 day trial. No credit card required.
ENGINE YARD
A NoOps PaaS for deploying and managing applications on AWS backed with a world-class support offering.
DEVSPACES
DevSpaces is a cloud-based, dockerized development environment that you can get up and running in minutes.
CLOUDFIX
A cloud cost optimization service that automatically scans your AWS account to identify cost savings opportunities and implements user-approved fixes.
SCALEARC
A SQL load balancer that enables you to dramatically scale and improve database performance without any code changes to your application or database.
CODEFIX
A highly scalable, automated code cleanup service that finds and fixes issues automatically so that you can focus on developing new features.
14 day trial. No credit card required.