There are many opinions in the air about the impact that virtualization has on performance, so I thought a short blog would be good to explain (as best I can) virtual machine performance characteristics with pointers to relevant benchmarks and technical papers.
My background is that I was an early Product Manager working on VMware ESX Server (from version 1.5) and among other things ran product management for VMware for a few years. As a product management guy, I kept track of the output of the engineering performance group, and as a result had a reasonable high level (although never code level) understanding of the whys and wherefores of virtualization performance. Although I’m not as fresh on virtualization as I once was, I’ll try to do my best here. I also want to thank Steve Herrod at VMware, and Simon Crosby at Citrix for providing a technical sanity check on the blog contents, although I retain responsibility for any mistakes and oversights.
First, a solid statement: virtualization has always levied a CPU “tax.” Early on, this was very high, recently not so much. Probably the most comprehensive recent non-vendor benchmark of performance vs. native is AnandTech’s, which recently showed anywhere from a 2% to a 7% CPU tax on a fully loaded system running mixed workload 4-CPU virtual machines on recent hardware.
The virtualization tax has always varied a lot with the type of workload, number of virtual machines, number of virtual CPU’s per machine and your hypervisor type. The reason people have been willing to pay the tax is that virtualization is just a better way to manage systems: system utilization is higher because you can pack workloads together while still maintaining hardware-guaranteed security isolation; hardware upgrades are trivial because the guest OS’s always run on a consistent virtual hardware layer; image management is trivial; and neat tricks like shared copy-on-write memory means that you can actually use fewer resources in a virtualized environment. Best of all, you get a consistent container for managing your workload, no matter what you end up having to put in it.
However, many people still look at the Googles and Yahoo’s of the world who designed their architecture when virtualization tax was high and say “Google doesn’t believe in virtualization, so maybe I shouldn’t”. So, let’s dive into the issue of the virtualization “tax.”
There are really two aspects of performance that you have to consider when you look at virtualization. The first is, for a given workload, what level of work do you get done in a virtualized environment vs. a native environment. The second is, for a given level of work done, how much flexibility do you lose.
Take the example of a highly i/o bound workload. Let’s say a native environment gets 10M disk i/o’s performed per time period, and a virtual environment gets 9.8M disk i/o’s performed. Should you consider that a 2% overhead? Yes. But what if I was to tell you that under the native environment, CPU utilization was 20%, while under the virtual environment, the utilization was 30%. Should you consider that a 50% overhead? What’s the right number, 2% or 50%.
The rule here is that you always look at your limiting factor. If you’re burning more of the non-limiting factor by being virtualized, then you don’t really care—it wasn’t being used anyway. So 2% is the right number. But there’s a caveat: what if your workload changes so that you have a CPU intensive workload in another thread? Should you care that 10% of your CPU time is being burned by virtualization?
The history of modern virtualization is a history of engineers eating an elephant. Taking each bottleneck to performance in turn and tackling it. Knowing the things they can change, and the things they can’t, and having the wisdom to know the difference. Over time, as virtualization-friendly features have spread to every part of the IT stack, the most insuperable barriers to virtualization performance—oddities in the Intel architecture, OS limitations, uncooperative NIC’s—have been addressed one by one, until finally this year (yes, just this year), the last serious performance barriers to virtualization have been finally addressed.
But first, the history. Before the dawn of modern virtualization, there were lots of emulators out there that emulated one operating system on top of another. But because every OS call had to be emulated in software, they were slow. More importantly, if they were running on top of Windows, they were dependent on Microsoft Windows not changing its behavior from patch to patch—which of course was a terrible bet. But virtualization was different—most instructions didn’t have to be emulated—if they weren’t accessing memory or an i/o device (or were one of a handful of badly behaved instructions), they could simply be passed down directly to the CPU—drastically increasing performance, but also critically, bypassing the need for dependence on the host operating system’s API.
In the Beginning Was Disco
The seminal project inaugurating this generation of x86 virtual machines was the Disco project at Stanford, which published its key paper in 1997. That project (three of the four authors were future founders of VMware) built a virtual machine monitor for the Irix operating system running on the FLASH research processor.
The performance characteristics were reasonable for the systems of the day. 3% to 36% overheads for a single VM on memory/CPU intensive tasks. But the really interesting thing about the paper was that total system output with eight VM’s on an 8CPU system almost doubled vs. native, because on native hardware, Irix was not very effective at scheduling work across 8 cores.
The Stone Age: VMware Workstation
VMware was founded in 1998 and in 1999 it released VMware Workstation 1.0. As a desktop product, it ran on Windows and Linux and allowed people to run other operating systems on top of either. By this stage, VMware engineering had tuned the core virtual machine monitor so that memory and CPU intensive workloads were pretty fast compared to native with some exceptions. On the other hand, networking-intensive workloads had fairly terrible performance.
The reasons for the overheads were outlined by some of the VMware engineering team in a 2001 Usenix paper. The paper gamely showed that with several optimizations it was possible to get full native throughput for networking workloads (10/100 BaseT), although the amount of CPU work spent to process that workload was about 4x the work required in the native environment. The paper also pointed out several possible further optimizations.
The Bronze Age: Hypervisors
One of the optimizations suggested was a custom kernel that would cut the amount of interrupt handling (a major cause of CPU overhead) in half by bypassing a host operating system. The ESX Server project was already in full swing by that stage, and when the product came out, it had two big innovations—the vmkernel, a kernel built from scratch to run guest OS’s, and VMFS, a highly simplified extent-based file system for fast disk access.
The benchmarks for ESX Server were a huge improvement over the host-based Workstation. ESX Server 1.0 could basically process a 10/100 networking workload with about a 10-20% CPU burn. One of the things working in virtualization’s advantage has always been that Moore’s Law would give it more CPU cycles every year, so the fixed overhead of processing a particular workload decreased proportionally over time. However, as customers shifted to GigE networking during 2003, benchmarks vs. native took a nose-dive. On the server hardware of the time, GigE workloads were being CPU-limited at about 300 MB/s for an average packet size. Basically, you could saturate your CPU just processing network traffic. (To be fair, on the hardware of the time the CPU burn was also very high on native hardware.)
The Iron Age: Paravirtualization and Virtual SMP
The next jump in performance was the introduction of paravirtualization by the Xen open source team and multi-CPU virtual machines by VMware. The Xen team patched Linux to get rid of some of the more problematic instructions to virtualize. The first Xen software also included a high performance networking system (but I believe that system was later abandoned due to other issues — although hopefully someone with better Xen knowledge could chip in with more details).
Meanwhile, VMware was introducing the first multi-CPU guest virtual machines. This was a long performance optimization task. In early stages of development in 2002, Virtual SMP achieved about 5% of native performance, but over the course of 18 months of steady performance optimization, it got to about 75% of native, and it shipped at around that performance level. Around the same time, (about early 2004), Intel shipped the first generation of VT technology, slightly ahead of AMD’s equivalent. Ironically, this initially decreased VMware performance on some workloads, and VT did not enjoy a lot of adoption. A great backgrounder on the impact of VT technology is Ole Agesen’s primer from VMworld 2007.
The Silicon Age: Virtual I/O
Since 2005, VMware and Xen have gradually reduced the performance overheads of virtualization, aided by the Moore’s law doubling in transistor count, which inexorably shrinks overheads over time. AMD’s Rapid Virtualization Indexing (RVI – 2007) and Intel’s Extended Page Tables (EPT – 2009) substantially improved performance for a class of recalcitrant workloads by offloading the mapping of machine-level pages to Guest OS “physical” memory pages, from software to silicon. In the case of operations that stress the MMU—like an Apache compile with lots of short lived processes and intensive memory access—performance doubled with RVI/EPT. (Xen showed similar challenges prior to RVI/EPT on compilation benchmarks.)
Some of the other performance advances have included interrupt coalescing, IPv6 TCP segmentation offloading and NAPI support in the new VMware vmxnet3 driver. However, the last year has also seen two big advances: direct device mapping, enabled by this generation of CPU’s (e.g. Intel VT-D first described back in 2006), and the first generation of i/o adapters that are truly virtualization-aware.
Before Intel VT-D, 10GigE workloads became CPU-limited out at around 3.5Gb/s of throughput. Afterwards (and with appropriate support in the hypervisor), throughputs above 9.6 Gb/s have been achieved. More important, however, is the next generation of i/o adapters that actually spin up mini-virtual NIC’s in hardware and connect them directly into virtual machines—eliminating the need to copy networking packets around. This is one of the gems in Cisco’s UCS hardware which tightly couples a new NIC design with matching switch hardware. We’re now at the stage that if you’re using this year’s VMware or Xen technologies, Intel Nehalems and Shanghai Opterons and the new i/o adapters — virtualization has most performance issues pretty much beat.
Common Attribution Problems
So why then do people attribute chronic performance problems to virtualization? Well sometimes they’re comparing apples and oranges, new hardware to old. And sometimes they’re not comparing limiting factors. A sysadmin will sometimes pack virtual machines on a machine until CPU utilization hits 75%, without realizing that he’s run out of i/o capacity way before that. And sometimes it’s true. Running hundreds of multi-CPU VM’s on a single machine still probably wastes a lot of CPU cycles—but in that case, the alternative of putting all those Guest Operating Systems on separate servers is probably a very expensive idea. And I have to imagine (without evidence, but just looking at trends) that performance overheads for 8+ vCPU virtual machines are still not all that great. But in most cases, the tax seems to be worth it.



Watch a Live Demo of Engine Yard AppCloud
The Engine Yard Newsletter
I'm surprised you didn't mention HyperV at all, especially given the work to make IO performance competitive.
I have very little background or experience with Microsoft virtualization. I do, however, have plenty of experience with misleading Microsoft benchmark marketing (for example, the complete disgrace of Microsoft's Exchange benchmarks in the 90's) so I did not include them in this article.
@Itay
This is a pretty thin list of "Supported Guests" to include HyperV in a serious discussion around the state of virtualization: http://www.microsoft.com/windowsserver2008/en/us/...
I don't disagree, but it's besides the point. I think it's fascinating the architectural work that's been done in all the virtualization products to enable good performance, and I don't think HyperV is any different in this regard. I don't make any claims of whether it is better, I'd just like to see it included in the discussion, and see how it compares (or even just be told that nothing they did was interesting – my experience is small compared to Michael's).
Finally, as a disclaimer, I work for Microsoft, though not for the HyperV team. I personally think it's a good example of a good product that's come out in recent times into a competitive market. However, I'm always happy to hear an educated discussion of such a field.
Thanks for an excellent educational article!!
You say "three of the four authors…" but there are only three authors on the Disco paper. Who is the fourth?
Well there is another version of the paper with Kinshuk Govil as a co-author — don't know the reason why there are two versions. I should probably ask.
http://qstream.org/~krasic/cs508-2006/summaries/p...
Never mind HyperV – not a mention of IBM's VM range – which have been providing complete virtualisation (complete enough to run VM under VM) since 1972, and has antecedents back to the CP-40 project in 1966?
http://en.wikipedia.org/wiki/VM_%28operating_syst...
http://en.wikipedia.org/wiki/IBM_CP-40
Those who don't know history, etc. etc.
There is a disclaimer saying "modern virtualization". VMware has always credited IBM very widely as the inventor of virtualization in the first place, but at this point, I think we can consider the 70's archaeology rather than contemporary history.
Considering that IBM's VM is still shipping on new machines today (z/VM), and is still capable of things that none of the "modern virtualization" systems are able to do, perhaps we shouldn't ignore it or be quite so quick to dismiss it.
Chris, the one time I tried to read IBM z/VM docs, I found them fairly impenetrable. They seem to assume a great deal of background in z/OS. It may be a very powerful system, but it's doesn't seem very approachable. Although it would be pretty interesting to get JRuby running on the z/OS JVM and see it run Rails.
Very good posting, Michael. I have forwarded the link to several analysts and colleagues.
Where does Bochs fit into this conversation? I ran into Bochs long before I ever heard of VMware (and probably before VMware was a company, since this was the mid – late 90s, maybe around the time of the Disco project).
http://bochs.sourceforge.net/
Ugh. Ok, it looks like Bochs is a x86 simulator (technically) which explains why I was able to consider running it on Ultrix.
Nice article, but I'm pretty sure that the throughputs in the following paragraph are measured in Gb/s (gigabits/s) and not GB/s (gigabytes/s):
"Before Intel VT-D, 10GigE workloads became CPU-limited out at around 3.5GB/s of throughput. Afterwards (and with appropriate support in the hypervisor), throughputs above 9.6 GB/s have been achieved."
Thanks for the catch Alin. 50 dkp- for the proofreader (me)
As the product manager at a peer of EngineYard in Brazil, I'm really curious if processor power really is that important for you guys. In your environment, do you guys see your hosts consuming all the CPU to the point where it impacts performance?
Although I also manage a virtualization-based product, I'm with Google on this one as I don't believe that virtualization is the answer. I think it's a necessary evil. Operating Systems *as we know them today* must die. They are a landfill of leftover junk from a 35+ year operating systems feature arms race. Cloud Computing started in the 60s and the minicomputer took us on a 35-year detour (albeit a necessary detour), that we are only now returning from.
We don't need more processor power. Instead we need more parallelization and more applications that move the instructions to the data, instead of trying to bring the data to the instructions. That's why I think that Google has the right idea with AppEngine. That's why we have MapReduce/Hadoop.
Anyone that is working with the cloud should familiarize themselves with the Von Neumann Bottleneck (http://www.stanford.edu/class/cs242/readings/back... and understand that CPU cycles nowadays are effectively free. The only exception I see are HPC applications that are calculation intensive, but these are the exception, not the rule.
I wouldn't even go as far as to say that the 70s are archeology as another poster mentioned. A lot of the problems we encounter today were solved in the 60s and 70s, but we've forgotten those lessons learned. That John Backus article I linked to is from 1977.
We also need more functional languages designed for parallelization and we need to start teaching teaching CS students them so that they stop using "word-at-a-time" approaches to solving problems.
One of the biggest problems with virtualization is that we are permitting OS systems to persist. The OS was designed to own the hardware and manage the hardware. Does't the hypervisor do that as well? Why do we need two pieces of software to manage memory? When the OS starts up, it automatically assumes it owns all the memory given to it and other than some awkward workarounds, there is no way to take that memory back and make it available to other programs in other VMs. If Eiji Toyoda investigated how IaaS clouds work today, he would certainly use to Japanese word "muda" (waste) to describe what we are doing.
The OS today, is nothing more than a very expensive runtime, and it is virtualization that is promoting the persistence of this waste. Market demand is the reason we have virtualization. Virtualization doesn't exist because it's the best technical solution.
Everyone working in the cloud needs to understanding that "all this amazing architectural work" being done to get the most performance are simply workarounds to a CS bottleneck that has been with us for over 50 years.
Well I think your sentiments can be characterized sort of as "if we only wrote everything in LISP and Haskell and ran it on a mainframe, we'd be in better shape". True, but sort of beside the point. As Joyce wrote, "History is a nightmare from which I am trying to awake". But in the real world, history has given us a stock of human and physical capital that represents the slow and lately, more rapid accumulation of knowledge and energy. In software, it means that whatever solution we move to, has to leverage the last 35 years of software investment that companies have in-house doing useful work. For many pieces of software — there is no source code, or it's proprietary binaries with a defunct provider or the code is so impenetrable that no-one can figure it out. Virtualization is a lowest common denominator solution that can put even the worst piece of historical software garbage into a nice container with management dials and levers with guaranteed isolation enforced by hardware. That's why it works, but yes, it ain't pretty.