Lately there have been lots of benchmarks of various Cloud services floating around. It appears that it’s all the rage. There are plenty of things to take from these benchmarks, but I haven’t really seen anyone knit them together into a coherent view of what the Cloud means in terms of your web applications.
For this post, let’s take a look at Amazon’s EC2. We’ve used it here at Engine Yard, our friends at RightScale use it, and there’s generally a pretty good amount of information about it floating around the web. EC2 has a very distinct and interesting behavior profile. Let’s take a moment to see what consensus there is about that behavior. The salient points of most coverage I can find is roughly:
- EC2 is fairly variable in terms of absolute performance.
- EC2 is at least within the same ballpark as most commodity equipment.
- EC2’s EBS performs slightly less well than native disks.
I think everyone with any experience would have guessed the above. Machines are going to have little differences. Small differences in the randomized layout generated by randomized resource allocations have noticeable effects on the behavior of applications. EBS, being some sort of network attached disk, runs slower than a normal disk because it’s more than just a disk. All of this makes complete sense.
Given the above, the question is “Why are these benchmarks important?” Some more pedantic types would chime in that seeing what you expect on a benchmark is good confirmation. Others with a need-for-speed will claim that this is evidence that rolling your own infrastructure is always preferable because of the aggregate speed benefits. However, I think that these benchmarks only serve to show what is most important to Amazon. I also humbly suggest that what’s good for Amazon is good for you. At least, it should benefit you if your sites grow at all.
In general, these benchmarks show that EC2 is designed for making scalable applications. Their performance isn’t top-of-the-line, but it’s not abysmal either. An EC2 instance is appreciably slower than bare metal, but it’s instantly replaceable. EBS isn’t crazy fast, but it’s a portable, durable data store.
This is what most of the benchmarks out there are missing. In interpreting these benchmarks, I immediately realized that these are not the metrics that matter to scalable applications. Raw performance has at most an instantaneous, linear effect on your application. Every one of these metrics is only concerned with raw performance. As such, they’re not really useful.
A noteworthy suite of benchmarks would be done on metrics like “cost-to-grow per request.” Such metrics are slippery and difficult to nail down, but they are where EC2 really shines. It’s clear that the message from EC2’s benchmarks is that performance isn’t king, scalability is, and effective application architecture isn’t easily benchmarked. EC2 is designed for applications that can scale. Such applications don’t demand the highest performance, they demand moderate performance. They don’t demand the utmost high-availability, they demand tolerance of failures.
From that angle, it’s pretty clear that both camps are right, sort of. EC2 benchmarks exactly as we’d expect (making the pedants happy), and it’s built to deliver applications at the top end of throughput (theoretically making the speed-addicted happy). The catch is that there is a fundamental switch in how you view throughput. Rather than being about performance, it’s about aggregate performance (sometimes called scalability). That has everything to do with the type of applications on which Amazon was built.
So, how should we interpret the benchmarks of EC2? We should interpret that the Cloud is about building applications in different ways than previously possible. We should see that in a world where hardware is commoditized, squeezing out the last drops of performance plays second fiddle to adding hardware. We should notice that making applications tolerate failures is the new “high availability.”
In my opinion, it’s an interpretation that’s long overdue.

It is normal for very large distributed systems that you have to relax you HPC requirements if you want high scalability.
We have run a test at the European Space Agency with the Amazon Cloud. And we have come up with similar conclusions.
We have shared the presentation slides
http://tinyurl.com/pz9xjt
All the best, good post! Alfonso http://www.linkedin.com/in/alfonsooliassanz
Hi Alfonso – thanks for the comment. I'm interested in seeing your presentation slides, but the link you provided in your comment is broken.
Hi Nick
Sorry for the link but we found that we have some copyrighted material from ESA and we have to wait for a permission. As soon as they give us the Ok I will put it online again.
Cheers Alfonso
More important than EBS being slower than disks is that it's got an absolute maximum sequential read/write speed. Since it's a network-based solution (presumably iSCSI or similar) there is a maximum throughput of 125MBps, well short of a real RAID solution. Add to this the following considerations:
So, bottom line is that random read/writes should be pretty good, but any kind of sequential read/write work loads will be problematic. Many OLTP databases have a roughly 30% seq write workload.
See following for some examples of work load I/O profiles:
http://blogs.msdn.com/tvoellm/archive/2009/05/07/...
More important than EBS being slower than disks is that it's got an absolute maximum sequential read/write speed. Since it's a network-based solution (presumably iSCSI or similar) there is a maximum throughput of 125MBps, well short of a real RAID solution. Add to this the following considerations:
So, bottom line is that random read/writes should be pretty good, but any kind of sequential read/write work loads will be problematic. Many OLTP databases have a roughly 30% seq write workload.
See following for some examples of work load I/O profiles:
http://blogs.msdn.com/tvoellm/archive/2009/05/07/...
More important than EBS being slower than disks is that it's got an absolute maximum sequential read/write speed. Since it's a network-based solution (presumably iSCSI or similar) there is a maximum throughput of 125MBps, well short of a real RAID solution. Add to this the following considerations:
So, bottom line is that random read/writes should be pretty good, but any kind of sequential read/write work loads will be problematic. Many OLTP databases have a roughly 30% seq write workload.
See following for some examples of work load I/O profiles:
http://blogs.msdn.com/tvoellm/archive/2009/05/07/...
More important than EBS being slower than disks is that it's got an absolute maximum sequential read/write speed. Since it's a network-based solution (presumably iSCSI or similar) there is a maximum throughput of 125MBps, well short of a real RAID solution. Add to this the following considerations:
So, bottom line is that random read/writes should be pretty good, but any kind of sequential read/write work loads will be problematic. Many OLTP databases have a roughly 30% seq write workload.
See following for some examples of work load I/O profiles:
http://blogs.msdn.com/tvoellm/archive/2009/05/07/...
More important than EBS being slower than disks is that it's got an absolute maximum sequential read/write speed. Since it's a network-based solution (presumably iSCSI or similar) there is a maximum throughput of 125MBps, well short of a real RAID solution. Add to this the following considerations:
So, bottom line is that random read/writes should be pretty good, but any kind of sequential read/write work loads will be problematic. Many OLTP databases have a roughly 30% seq write workload.
See following for some examples of work load I/O profiles:
http://blogs.msdn.com/tvoellm/archive/2009/05/07/...
More important than EBS being slower than disks is that it's got an absolute maximum sequential read/write speed. Since it's a network-based solution (presumably iSCSI or similar) there is a maximum throughput of 125MBps, well short of a real RAID solution. Add to this the following considerations:
So, bottom line is that random read/writes should be pretty good, but any kind of sequential read/write work loads will be problematic. Many OLTP databases have a roughly 30% seq write workload.
See following for some examples of work load I/O profiles:
http://blogs.msdn.com/tvoellm/archive/2009/05/07/...