<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Engine Yard Blog &#187; Michael Mullany</title>
	<atom:link href="http://www.engineyard.com/blog/author/michaelmullany/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.engineyard.com/blog</link>
	<description></description>
	<lastBuildDate>Tue, 07 Feb 2012 19:36:04 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.1</generator>
		<item>
		<title>LDAP Directories: The Forgotten NoSQL</title>
		<link>http://www.engineyard.com/blog/2009/ldap-directories-the-forgotten-nosql/</link>
		<comments>http://www.engineyard.com/blog/2009/ldap-directories-the-forgotten-nosql/#comments</comments>
		<pubDate>Thu, 17 Dec 2009 18:00:56 +0000</pubDate>
		<dc:creator>Michael Mullany</dc:creator>
				<category><![CDATA[Ruby]]></category>
		<category><![CDATA[Technology]]></category>
		<category><![CDATA[Cassandra]]></category>
		<category><![CDATA[Key-Value Stores]]></category>
		<category><![CDATA[LDAP]]></category>

		<guid isPermaLink="false">http://www.engineyard.com/blog/?p=3019</guid>
		<description><![CDATA[<p>When most Rails developers encounter LDAP, it's usually for user authentication. And most of the time, there's no choice, they're working under a dictate that requires them to use it. Usually, this means Active Directory, but very occasionally something like OpenLDAP or the Sun Java Systems Directory Server.</p>
<p>It's hard to imagine now, but there was once great excitement about the potential for LDAP based directory servers to become more than just authentication servers and morph into general purpose datastores.  LDAP directories promised a single, scalable, high performance data store that could be queried for common information across multiple applications. After all, directories had a lot of virtues:</p>
<ul>
<li><strong>Fast Queries:</strong> LDAP directories were heavily indexed, so query speeds were truly impressive—reliably 10x what a relational database could manage. (Write speed was much slower for the same reason: lots of indexes to update when a write happened)</li>
<li><strong>Replication:</strong> LDAP directories were an "eventually consistent" data store long before Dynamo or <a href="http://www.engineyard.com/blog/2009/cassandra-and-ruby-a-love-affair/">Cassandra</a>. Multi-master replication allowed a distributed network of directories to accept writes at any node, and then relay these updates around the directory network. The last update in time always won.</li>
<li><strong>Partionable: </strong>directories were giant tree structures, and branches could be picked up and moved to another server if the directory got too big. There was built-in referential linking from each amputation point to the correct server, and these servers could be easily geographically distributed.</li>
<li><strong>Standardized and efficient: </strong>coming from a telecom heritage, LDAP was an efficient wire protocol. It was globalized and cross-system. LDAP queries and responses were binary encoded using distinguished encoding rules, using ASN.1 as the data representation syntax.</li>
</ul>
<p>In addition to these benefits, directories like Netscape Directory Server and Microsoft Active Directory had a seemingly endless list of other features like rich, complex configurable access control rules and permissions; multiple ways to define groups; rich query semantics and more.</p>
<p>And yet, when we look around today, it's not LDAP directories that have the NoSQL buzz; it's the far looser and simpler key-value stores like Cassandra, MongoDB and Redis. So where did LDAP fall down, and is there anything to be learned from its (relative) failure? Here is my own take on why LDAP didn't take over the world, colored by my (brief) tenure as a product manager for Netscape Directory Server.</p>
<ol>
<li><strong>Telecom protocols FTL</strong>: LDAP, in my own humble opinion, was fatally crippled by its telecom parentage. Just reading the first page of the ASN.1 data structure specification could make your eyes bleed. Debugging a badly behaved LDAP client or query was basically a job for experts wielding binary to text crackers. There was a separate format—LDIF—for converting LDAP into human-readable code, but this was a friction point. Compared to ASN.1, JSON (as an example) is severely limited and incomplete, and yet... about 1000x more popular as a result.</li>
<li><strong>Access control that exceeded human brain capacity: </strong>LDAP directories provided lots of rope for people who cared about security to firmly and irrevocably tie themselves in knots. Time and again, I'd see customers with five or more layers of access control rules they found to be confounding, with counter-intuitive effects. Better yet, this level of complexity was indecipherable by anyone without drawing five dimensional set diagrams. Sometimes, there are features you shouldn't put into a product no matter how much people ask you. They know not what they do.</li>
<li><strong>Interesting</strong> <strong>data wanted to be relational: it</strong> was a simple, but sad truth. Data that's interesting and important enough to be accessed often by your applications, seems to want to be compared and operated on in the context of your other interesting data; that sounds a lot like the right case for a relational database. Directories, as a hierarchical data store, couldn't easily accommodate the kinds of queries that customers ended up wanting to do, once they were storing enough interesting data. So the solution was to patch in "relationy" features like aliases which soft-linked two values in different parts of the tree—but these were patchwork solutions. In their worst (over-used) incarnation, they turned a directory server into a weird hard-to-maintain mutant hybrid of relational and hierarchical database.</li>
</ol>
<p>There were other downsides to LDAP directories of course. The learning curve could be steep for LDAP since it was a truly novel technology for most people used to RDBMS's and SQL. And probably most importantly, most directories weren't open source, and so they missed the opportunity to fully leverage a community of interested developers and administrators.</p>
<h2>Lessons for this Generation of NoSQL (?)</h2>
<p>I hesitate to speculate on the lessons from LDAP for this generation of NoSQL stores, since open source has changed the game considerably in the last ten years. That said, I <em>do</em> think LDAP got a lot of things right (fast, distributable, scalable and standardized). It's arguable whether custom binary protocols (aka MongoDB's) will really hurt adoption as long as the data structure specifications are reasonably readable, but Couch's JSON/REST/HTTP combo is certainly a little easier on the eyes.</p>
<p>I do know one thing: keep the access control simple. Your users will thank you later!
<p><a href="http://www.engineyard.com/blog"><img height="98" width="61" title="logo-engineyard" alt="" class="attachment-post-thumbnail wp-post-image" src="http://www.engineyard.com/blog/wp-content/uploads/logo-engineyard.png"/></a></p>
]]></description>
		<wfw:commentRss>http://www.engineyard.com/blog/2009/ldap-directories-the-forgotten-nosql/feed/</wfw:commentRss>
		<slash:comments>12</slash:comments>
		</item>
		<item>
		<title>Introducing FasterScripts.com! Share Your Performance Experiences</title>
		<link>http://www.engineyard.com/blog/2009/introducing-fasterscripts-com-share-your-performance-experiences/</link>
		<comments>http://www.engineyard.com/blog/2009/introducing-fasterscripts-com-share-your-performance-experiences/#comments</comments>
		<pubDate>Thu, 03 Dec 2009 23:54:54 +0000</pubDate>
		<dc:creator>Michael Mullany</dc:creator>
				<category><![CDATA[News]]></category>
		<category><![CDATA[Ruby]]></category>

		<guid isPermaLink="false">http://www.engineyard.com/blog/?p=2996</guid>
		<description><![CDATA[<p><img src="http://eyweb-images.s3.amazonaws.com/FasterScripts.com.jpg" alt="FasterScripts.com home page" width="596" height="258" /></p>
<p>Today, we're introducing <a href="http://www.fasterscripts.com">fasterscripts.com</a>, a site for web producers and developers to share information on poor performance by third party services that they have to embed in their web applications. We wanted to create a nice simple application that would allow people to band together to ask third party services (like ad serving networks, click counters, marketing trackers etc.) to optimize their performance. Fasterscripts.com allows you to report specific services that are performing slowly (with an uploaded screen shot of the slow load from firebug or pagespeed), and tag it for specific performance problems (like not gzipping content or having broken eTags.)</p>
<h2>Why does it matter?</h2>
<p>Fast sites matter a lot. <a href="http://www.engineyard.com/community/railsroadshow">Faster sites get more visits, more page-views per visit and generate more revenue</a>. Above a certain threshold, slow sites pay more for Google Adwords, and there's a very good possibility that slow sites will soon be <a href="http://searchengineland.com/site-speed-googles-next-ranking-factor-29793">deranked in Google's Search Engine rankings</a>. (Google, in particular, seems to be on a campaign to speed up web-site performance to native client levels.) So if you're someone who makes their living developing web applications, you should care very much about performance, and you should care about the speed of third party services.</p>
<h2>Where did this come from?</h2>
<p>When we were doing the <a href="http://www.engineyard.com/blog/2009/rails-in-the-wild-5-client-side-performance-observations/">performance research</a> for our <a href="http://www.railsroadshow.com/">Rails Performance in the Cloud Roadshow</a>, we took a sample of 100 Rails sites and ran them through firebug and the <a href="http://developer.yahoo.com/yslow/">YSlow front-end performance analyzer</a>. What we found about average response times was not that surprising: 3.2 seconds for cold-start home page loads.</p>
<p>What we found about the poor performers—the sites that were taking 10 or 20 seconds to load—was <strong>very</strong> interesting. In pretty much every case, these super-slow response times were caused by poorly responding third party services. And it wasn't just the small guys: in many cases it was big services like Facebook connect, Google Analytics and Doubleclick that were serving up content slowly, and sub-optimally.</p>
<p>Thankfully, at least Google Analytics shouldn't be a problem soon. Yesterday, it announced that it was adding an <a href="http://code.google.com/apis/analytics/docs/tracking/asyncTracking.html">asynchronous option</a> for Google Analytics. This will provide a way to avoid page load stutter caused by its previous synchronous-only "document.write" implementation. But there are still lots of third party services out there that cause performance problems for otherwise well-optimized sites. (Our own favorite sub-optimal bugbear here at Engine Yard is the trackalyzer script from LeadLander, which we use for tracking site visits.)</p>
<h2>This sounds neat, how do I get involved?</h2>
<p>Go to fasterscripts.com and report slow services. If a service has already been reported, you can "vote it up" in the rankings. Once we have a certain number of reports for a particular service, our plan is to email the company with a list of reporters, and ask them to optimize their service, with tips on how to do it. If this gets momentum, we'll probably be looking for volunteers to help manage the site, so <a href="mailto:info@fasterscripts.com">let us know</a> if you'd be interested in helping out. If you have suggestions on how to improve the site, let us know as well.
<p><a href="http://www.engineyard.com/blog"><img height="98" width="61" title="logo-engineyard" alt="" class="attachment-post-thumbnail wp-post-image" src="http://www.engineyard.com/blog/wp-content/uploads/logo-engineyard.png"/></a></p>
]]></description>
		<wfw:commentRss>http://www.engineyard.com/blog/2009/introducing-fasterscripts-com-share-your-performance-experiences/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Programming Contest! And the Challenge is&#8230;Measure Rails Momentum</title>
		<link>http://www.engineyard.com/blog/2009/programming-contest-and-the-challenge-is-measure-rails-momentum/</link>
		<comments>http://www.engineyard.com/blog/2009/programming-contest-and-the-challenge-is-measure-rails-momentum/#comments</comments>
		<pubDate>Fri, 13 Nov 2009 01:25:50 +0000</pubDate>
		<dc:creator>Michael Mullany</dc:creator>
				<category><![CDATA[Contests]]></category>
		<category><![CDATA[Ruby]]></category>
		<category><![CDATA[Contest]]></category>
		<category><![CDATA[Rails]]></category>

		<guid isPermaLink="false">http://www.engineyard.com/blog/?p=2855</guid>
		<description><![CDATA[<p>We announced the <a href="http://www.engineyard.com/blog/2009/win-a-motorola-droid-programming-contest-worst-app-server-technology-ever/">Worst App Server Ever (WASE) contest</a> last week, as the second in a series of Engine Yard programming contests. Since then, we've heard lots about your efforts to put a <a href="http://github.com/dougal/wase_endpoint/ ">basic twitterbot </a>together, and now the time has come to describe the challenge computation. Remember: you have until 5 p.m. Monday to complete your calculations!</p>
<p><span style="color: #ff0000;">UPDATE: </span><span style="color: #ff0000;">Challenge calculations submissions should be in the form of a RETWEET to @engi<span style="color: #ff0000;">neyard</span></span><span style="color: #ff0000;"> of the first message in your wase program from your home twitter account.  The final WASEpoint in your program listing should be @eycontest (so we can measure who finished first)</span><span style="color: #ff0000;">. You can <a href="http://www.engineyard.com/contests/wase/">register your wasepoints here</a>.]</span></p>
<p>People often talk about the momentum of intangible things, but it's always been pretty much impossible to define. No Longer! In this contest, you'll be measuring the momentum of Rails. How will you accomplish this, you ask?</p>
<p>Well, since Rails <em>is</em> its community, if we can measure the mass, speed and direction of the Rails community, then (clearly!) we can establish its momentum. For the purposes of this contest, and to make things easy, we will use as a rough proxy that the Rails community is everyone following <a href="http://www.twitter.com/dhh">@dhh</a> on twitter.</p>
<p>You must perform the following tasks to establish Rails momentum:</p>
<p>1) Establish the mass of the Rails community: defined as looking up the locations of all <a href="http://www.twitter.com/dhh">@dhh</a> followers from Twitter, geo-coding their locations, and then multiplying the number of people in each location by the average body mass of an adult in that country.</p>
<p>2) Establish the current location of the Rails community: defined by taking the locations and body weights from step 1, and calculating the community's center of mass (its centroid).</p>
<p>3) Establish the speed and direction of the Rails community: defined by taking its current location (calculated in Step 2) and comparing it to the origin of Rails. Rails 1.0 was released in Chicago on December 14th, 2005, so in the approximately 1,430 days since then, its location has moved. This means we can calculate its average speed and direction over the last four years.</p>
<p>Although we will expect the answer in the form of JSON object properties, the answer (in free text) to the contest would look like:</p>
<ul>
<li>Rails Momentum is 15 metric tons per hour with a bearing of 120 degrees. Its current location is latitude 38.898748 ° and longitude -77.037684°</li>
</ul>
<h2>Guidelines and Suggestions for Implementors</h2>
<p>Please refer back to our <a href="http://www.engineyard.com/blog/2009/win-a-motorola-droid-programming-contest-worst-app-server-technology-ever/">earlier post</a> for rules on how your entry must be structured and submitted. This description goes into details on the format of the input data and provides suggestions for WASEpoints and tips on what to avoid. A significant meta-challenge is agreeing on intermediate object formats with your fellow contestants! We have not specified what they should be.</p>
<h3>1) Calculating the mass of the Rails community</h3>
<p>The input data set has an array of all the twitter ID's currently following <a href="http://www.twitter.com/dhh">@dhh</a>. Twitter limits your API access to 150 calls per hour, unless you are a Twitter white-listed developer—so there is a clear opportunity for a white-listed developer to create a popular WASEpoint. If you're <em>not</em> a white-listed developer, be prepared to collaborate with others to pool your twitter requests via WASEpoint chaining. We will make an exception to the standard WASEpoint rules here on "no state at a WASEpoint" and allow people to create WASEpoints for others that provide a cached location lookup for the location property of each follower.</p>
<p>Only about 50% of twitter user profiles have standard formatted locations in the form of [City, State], [State, City] [State] or [City, Country], [Country, City] [Country]. Location data that lacks this format should be discarded, and the corresponding followers ignored. There's an opportunity here to write a WASEpoint that cleans and standardizes this data-set for others to use. We do <em>not</em> expect you to geocode imprecise location tags such as "The Midwest" or "Somewhere in the Clouds".</p>
<p>There are many REST-accessible geocoding services on the web. Both Yahoo and Google have REST API's and there are other responsive geocoding services with Ruby clients. We include a list of standardized countries and the average adult bodyweight as part of the input dataset for use in calculations. (It was actually quite surprising to see how many sources we had to go to to get this data, and it's mostly only accurate for OECD countries—most countries simply contain weights based on an average BMI.) Bodyweights are in kilograms. This is another module of work that would make a nice WASEpoint.</p>
<h3>2) Calculating the current location</h3>
<p>Calculating a center of mass for a sphere like the earth would basically put Rails in the center of the earth, but we didn't want to imply that Rails has gone to hell. Instead you are allowed to assume a flat-earth when calculating the centroid of the community, and the center of this flat mapping should be at Chicago's longitude and the equator's latitude. This latitude/longitude is provided as part of the input data set.</p>
<p>This puts the "edge" of this map approximately in western China, so India is to the "east" of Chicago, but Bangladesh would be "west" of Chicago. In this map, the center of mass for a developer in India and a developer in Bangladesh would not be on their border, it would be in the Caribbean. There is a clear opportunity here to write two WASEpoints: one that geocodes correctly and one that calculates centroids correctly.</p>
<p><img class="alignnone size-full wp-image-2885" title="Programming contest sample world map" src="http://eyweb-images.s3.amazonaws.com/contest_map.png" alt="Programming contest sample world map" width="510" height="398" /></p>
<p>(If anyone wants to tackle calculating a true 3D centroid with a projection back to the nearest point on the earth's surface that would certainly be a strong candidate for third prize (best WASEpoint))</p>
<h3>3) Establishing the speed and direction</h3>
<p>Rails 1.0 was announced in Chicago (lat 41 54 	long 87 39 ) on Dec 14th, 2005. If you establish that the center of the community is now in (for example) Albany, New York (lat 42 45 long 73 48), then it has traveled approximately 711 kilometers in 1,430 days, giving it an average land velocity of about 0.25 km/hr (unladen), and a final bearing of about 80°. For this part of the calculation we DO expect you to consider the earth spherical, but there are a number of handy math guides on the internets to help you calculate bearings from latitude and longitude data. This conversion alone could make a nice WASEpoint for others to use.</p>
<h2>The Input Data Object</h2>
<p>We have posted the input data for the challenge on web. We're giving you two JSON files in UTF-8 format.</p>
<ul>
<li><a href="http://assets.engineyard.com/wase/TwitterList.json">TwitterList.json</a>= An array of twitter id's of everyone following <a href="http://www.twitter.com/dhh">@dhh</a></li>
</ul>
<ul>
<li><a href="http://assets.engineyard.com/wase/WeightbyCountry.json">WeightbyCountry.json</a> = An array of Countries with their Bodyweight Values.</li>
</ul>
<h2>The Output Data Object</h2>
<ul>
<li>We will accept output data in whatever data format your final WASEpoint emits, but it must be obvious to us what the momentum number is and what the bearing is. We would suggest the following json format however (with sample data):</li>
</ul>
<p>{"RailsMomentum":{"Momentum":{"kgs per meter per second":15},"location":[{"latitude":333333,"longitude":333333}],"bearing":{"degrees":36,"minutes":45,"seconds":45}}}</p>
<p>Simple enough, right? Well, no one every said it was going to be easy! In fact, we tried to structure this, so you'd have to compete as part of a team! Good luck—we can't wait to see how it goes!
<p><a href="http://www.engineyard.com/blog"><img height="98" width="61" title="logo-engineyard" alt="" class="attachment-post-thumbnail wp-post-image" src="http://www.engineyard.com/blog/wp-content/uploads/logo-engineyard.png"/></a></p>
]]></description>
		<wfw:commentRss>http://www.engineyard.com/blog/2009/programming-contest-and-the-challenge-is-measure-rails-momentum/feed/</wfw:commentRss>
		<slash:comments>10</slash:comments>
		</item>
		<item>
		<title>Nginx Security Vulnerability: SSL Man in the Middle Attack</title>
		<link>http://www.engineyard.com/blog/2009/nginx-security-vulnerability-ssl-man-in-the-middle-attack/</link>
		<comments>http://www.engineyard.com/blog/2009/nginx-security-vulnerability-ssl-man-in-the-middle-attack/#comments</comments>
		<pubDate>Tue, 10 Nov 2009 18:30:35 +0000</pubDate>
		<dc:creator>Michael Mullany</dc:creator>
				<category><![CDATA[News]]></category>
		<category><![CDATA[Ruby]]></category>
		<category><![CDATA[Nginx]]></category>
		<category><![CDATA[Security]]></category>

		<guid isPermaLink="false">http://www.engineyard.com/blog/?p=2804</guid>
		<description><![CDATA[<p>A <a onclick="javascript:pageTracker._trackPageview('/outbound/article/seclists.org');" href="http://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2009-3555">security vulnerability in all versions of nginx </a>(as well as several other web servers) has been reported. Attackers can exploit this vulnerability by intercepting SSL sessions and compromising encryption key renegotiation via a plaintext injection, allowing the attacker to read the plaintext of the SSL session. A <a href="http://sysoev.ru/nginx/patch.cve-2009-3555.txt">patch</a> has been released for this vulnerability.</p>
<p>Engine Yard customers have already been contacted via email about this issue. For Engine Yard Cloud customers, this patch will be automatically applied the next time you perform a deploy. All other customers should open a support ticket so that you can arrange an appropriate maintenance window with support.
<p><a href="http://www.engineyard.com/blog"><img height="98" width="61" title="logo-engineyard" alt="" class="attachment-post-thumbnail wp-post-image" src="http://www.engineyard.com/blog/wp-content/uploads/logo-engineyard.png"/></a></p>
]]></description>
		<wfw:commentRss>http://www.engineyard.com/blog/2009/nginx-security-vulnerability-ssl-man-in-the-middle-attack/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Win a Motorola DROID Programming Contest: &#8220;Worst App Server Technology Ever&#8221;</title>
		<link>http://www.engineyard.com/blog/2009/win-a-motorola-droid-programming-contest-worst-app-server-technology-ever/</link>
		<comments>http://www.engineyard.com/blog/2009/win-a-motorola-droid-programming-contest-worst-app-server-technology-ever/#comments</comments>
		<pubDate>Tue, 03 Nov 2009 15:30:26 +0000</pubDate>
		<dc:creator>Michael Mullany</dc:creator>
				<category><![CDATA[Contests]]></category>
		<category><![CDATA[Ruby]]></category>
		<category><![CDATA[actors]]></category>
		<category><![CDATA[Contest]]></category>
		<category><![CDATA[Dataflow]]></category>
		<category><![CDATA[Twitter]]></category>

		<guid isPermaLink="false">http://www.engineyard.com/blog/?p=2710</guid>
		<description><![CDATA[<p>The goal of this contest is to collaborate with your other contestants to build the "worst app server ever" (WASE) , and use it to complete one or more challenge computations. The challenge computation(s) and their input data-set(s) will be announced and posted next week on Thursday, November 12. The contest will remain open until Monday, November 16th at 6pm PST.  Winners will be announced within the following week.</p>
<p>(<span style="color: #ff9900;">Update: </span>We think the rule-set below is now complete, but we still welcome any suggestions or tweaks that you might have.)</p>
<p>There will be three prizes.</p>
<ul>
<li>The first prize (a Motorola DROID and $1,000 of Engine Yard Cloud credit) goes to the person who completes our challenge task correctly first.</li>
<li>Second prize (a DROID and $500 of cloud credit) goes to the person who builds the most popular WASE endpoint (the one used the most often in the most submissions).</li>
<li>Third prize (a DROID) goes to the "best" WASE endpoint written in Ruby (as determined by us). The contest DROIDS are full price, non-contract-linked, US models.</li>
</ul>
<h2>How WASE Works</h2>
<p>Why is WASE the worlds worst app server technology? Well, instead of a sane message bus like AMQP, WASE uses Twitter as its message bus. Instead of a proper message router, WASE uses a list of twitter accounts as its program listing. And instead of encapsulating data with each message, WASE messages only contain a reference to JSON objects or arrays at input and output location(s) specified by a bit.ly.</p>
<h2>"This Sounds Like the World's Worst App Server. Tell Me More."</h2>
<p>Well here is an example of how a sample computation might work. Let's say @engineyard wants to take a JSON file containing an array of names, and get a list of  the top quartile of names after sorting the array. We know that there are two Twitter accounts (which we will henceforth call WASEpoints) @ey-sort and @ey-firsthalf that can be useful to us. We know @ey-sort takes an input data set, sorts it and outputs the result. We also know that @ey-firsthalf takes an input data set, and outputs the first "half" of the dataset. To perform a computation, we set up URI's for the program listing, the input data and the output data, and kick off the computation with an appropriate message. (For those of you with dataflow or actor-based programming experience, WASE should look like a vague, but disreputable cousin.)</p>
<p>So let's go through the message flow:</p>
<p>We'll put our program listing at:</p>
<p><a href="http://www.engineyard.com/top-quartile-sorted-list.json">www.engineyard.com/top-quartile-sorted-list.json</a>,  (bit.ly/7yQK6) whose body contents are a JSON array:</p>
<p>["@ey-sort", "@ey-firsthalf", "@ey-firsthalf", "@engineyard"]</p>
<p>We put our input data here:  <a href="http://www.engineyard.com/unsortedmegalist.json">www.engineyard.com/unsortedmegalist.json</a> (bit.ly/3kl0xs)</p>
<p>And set up a location for our output data here:  <a href="http://www.engineyard.com/top25percentofmymegalist.json">www.engineyard.com/top25percentofmymegalist.json</a> (bit.ly/2uhGcl)</p>
<p>Or to summarize the <a href="http://code.google.com/p/bitly-api/wiki/ApiDocumentation">bit.ly's</a>,</p>
<p>Program listing: bit.ly/7yQK6 (read with a http: GET)</p>
<p>Output location: bit.ly/2uhGcl (written with a http: PUT)</p>
<p>Input location: bit.ly/3kl0xs (read with a http: GET)</p>
<p>To perform the computation, we'd simply send the following twitter message from our @engineyard account: @ey-sort #wase, 0, bit.ly/7yQK6, 1256850843, bit.ly/2uhGcl, bit.ly/3kl0xs</p>
<p>So the message format of a WASTE message is: [WASEpoint], [WASE hashtag] [Program Counter (0 initially)], [Program listing URI], [<a href="http://www.unixtimestamp.com/index.php">Unix Timestamp</a>], [Output URI] [,Input URI (optional)] [, Input URI 2 (optional)]</p>
<p>In the case of this computation, the message and computation sequence would look like:</p>
<ul>
<li>@engineyard sends: "@ey-sort #wase, 0, bit.ly/7yQK6, 1256850843, bit.ly/2uhGcl, bit.ly/3kl0xs"</li>
</ul>
<p>.... @ey-sort reads the message from @engineyard in its twitter list and parses the message. First it GETs the program listing from bit.ly/7yQK6, GETS the input data set from bit.ly/3kl0xs, sorts it, PUTS the output to bit.ly/2uhGcl, then looks for the 0+1 WASEpoint in the program listing (@ey-sort) and then..</p>
<ul>
<li>@ey-sort sends: "@ey-firsthalf #wase, 1, bit.ly/7yQK6, 1256850875, bit.ly/2uhGcl"</li>
</ul>
<p>.... @ey-firsthalf reads the message from @ey-sort in its twitter list and parses the message. First it GETs the program listing from bit.ly/7yQK6, GETS the input data set from bit.ly/2uhGcl, halves it, then PUTS the output to bit.ly/2uhGcl,  looks for the 1+1 WASEpoint in the program listing (@ey-firsthalf) and then..</p>
<ul>
<li>@ey-firsthalf sends (to itself): "@ey-firsthalf #wase, 2, bit.ly/7yQK6, 1256850885, bit.ly/2uhGcl"</li>
</ul>
<p>... etc. ...</p>
<ul>
<li>@ey-firsthalf sends: "@engineyard #wase, 3, bit.ly/7yQK6, 1256850899, bit.ly/2uhGcl"</li>
</ul>
<p>--- finally @engineyard receives this message with the pointer to the final location of output data.</p>
<p>A few new things here. There's a program counter that tells the WASEpoint where in the program listing the computation is, and there's a Unix timestamp (could be useful for discarding messages that get held up in the twitterverse?). If no input URI is specified, then the WASEpoint should use the Output URI as both Input and Output locations. One restriction that we will enforce for the contest is that a WASEPOINT MAY NOT DECREMENT A PROGRAM COUNTER: to avoid infinite looping. (<span style="color: #ff9900;">Update: </span>And a WASEpoint must conserve the program listing and output URI's from the input to the output message.)</p>
<p>Hey, maybe we should have some basic error handling. Hmm, let's say @ey-firsthalf is expecting a standard JSON object but the input data fails to parse properly. Let's have it send the following message:</p>
<ul>
<li>@ey-firsthalf sends: "@engineyard, #wase, -1, bit.ly/7yQK6, 1256850885, bit.ly/2uhGcl</li>
</ul>
<p>So the error message structure is: [<span style="color: #ff9900;">Update: </span>first WASEpoint in program listing], [WASE hashtag], [Negative of Program Counter], [Program listing URI], [Unix Timestamp], [Output URI] [, Input URI (optional)] [, Input URI 2 (optional)]</p>
<p>Note that there are no type declarations in the message format because the only data-type supported by WASE are JSON objects and arrays.</p>
<h2>What are Guidelines for the Contest?</h2>
<p>Apart from the message format and data guidelines above -- here are additional guidelines:</p>
<p>1. Each contestant may register no more than 5 WASE endpoints/twitter accounts. WASEpoints must be <a href="http://www.engineyard.com/contests/wase/">registered here</a> to be eligible for use. Your WASEpoints must follow <a href="http://twitter.com/eycontest">@eycontest</a>. This is also where you should go to pick and choose good WASEpoints for constructing your app. Each contest entry must use a minimum of 10 WASEpoints from at least four separate contestants, where each WASEpoints performs functionally significant data operations. <span style="color: #ff0000;">You must supply your own Output URI</span>!</p>
<p>2. Please do not submit WASEpoints whose twitter accounts you do not own :-) We do not want to encourage the business of spamming Ashton Kutcher with mysterious messages. We will test each submitted WASEpoint with a DM to make sure they are legitimate.</p>
<p>3. Source code for your WASEpoint must be posted to a public repository (e.g codaset, github, sourceforge, kenai) for other contestants to inspect :-) If observed behavior deviates from the posted code (aka you have filed a prank WASEpoint), then your entry and all your WASEpoints will be removed from the registered list.</p>
<p>4. A WASEpoint must not store state, and may not rely on any state data other than the input data (of course, it's easy to generate a private data set programmatically, but this will also be considered state). Trivial WASEpoints (e.g. identity) will be disqualified, although triviality is hard to define, you know it when you see it.</p>
<p>5. You must use bit.ly as your URL shortener (to make everyone's job building parsers easier—and bit.ly has a http: interface.</p>
<p>7. <span style="color: #ff0000;">UPDATE: </span><span style="color: #ff0000;">Challenge calculations submissions must be in the form of a RETWEET to @engineyard</span> <span style="color: #ff0000;">of the first message in your WASE program listing from your home twitter account. The final WASEpoint in your program listing should be @eycontest. </span>You must be following @engineyard with your home account in order to enter.</p>
<p>8. We must be able to reproduce your computation using your program listing and our own output URI.</p>
<p>9. We STRONGLY encourage people to write their WASEpoints in Ruby, but we'll also accept Perl, Scala and Python. Although, be prepared for people avoiding your WASEpoint since the common denominator among people reading this blog is the fact that they know Ruby!</p>
<p>10. We may alter these guidelines along the way, based on your input and feedback, although the spirit and philosophy of them will remain.
<p><a href="http://www.engineyard.com/blog"><img height="98" width="61" title="logo-engineyard" alt="" class="attachment-post-thumbnail wp-post-image" src="http://www.engineyard.com/blog/wp-content/uploads/logo-engineyard.png"/></a></p>
]]></description>
		<wfw:commentRss>http://www.engineyard.com/blog/2009/win-a-motorola-droid-programming-contest-worst-app-server-technology-ever/feed/</wfw:commentRss>
		<slash:comments>29</slash:comments>
		</item>
		<item>
		<title>Security vulnerability in nginx</title>
		<link>http://www.engineyard.com/blog/2009/security-vulnerability-in-nginx/</link>
		<comments>http://www.engineyard.com/blog/2009/security-vulnerability-in-nginx/#comments</comments>
		<pubDate>Thu, 29 Oct 2009 20:40:18 +0000</pubDate>
		<dc:creator>Michael Mullany</dc:creator>
				<category><![CDATA[Ruby]]></category>
		<category><![CDATA[Technology]]></category>
		<category><![CDATA[Nginx]]></category>
		<category><![CDATA[Security]]></category>

		<guid isPermaLink="false">http://www.engineyard.com/blog/?p=2755</guid>
		<description><![CDATA[<p>A <a href="http://seclists.org/fulldisclosure/2009/Oct/306">security vulnerability in nginx</a> has been reported. This vulnerability is exploited via a null pointer dereference, and although this has been characterized as a Denial of Service attack, we suspect that it can be exploited to execute arbitrary code. As such, it's important for all nginx users to <a href="http://article.gmane.org/gmane.comp.web.nginx.english/16422">upgrade or patch this vulnerability</a> as soon as practicable.</p>
<p>All versions of nginx prior to 0.8.15, 0.7.62, 0.6.39 and 0.5.38 in the 0.8, 0.7, 0.6 and 0.5 nginx codelines are vulnerable.</p>
<p>Engine Yard customers have already been contacted via email about this issue. For Engine Yard Cloud customers, this patch will be automatically applied the next time you perform a deploy. All other customers should open a support ticket so that you can arrange an appropriate maintenance window with support.
<p><a href="http://www.engineyard.com/blog"><img height="98" width="61" title="logo-engineyard" alt="" class="attachment-post-thumbnail wp-post-image" src="http://www.engineyard.com/blog/wp-content/uploads/logo-engineyard.png"/></a></p>
]]></description>
		<wfw:commentRss>http://www.engineyard.com/blog/2009/security-vulnerability-in-nginx/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Announcement: Engine Yard Cloud Price Reduction</title>
		<link>http://www.engineyard.com/blog/2009/announcement-engine-yard-cloud-price-reduction/</link>
		<comments>http://www.engineyard.com/blog/2009/announcement-engine-yard-cloud-price-reduction/#comments</comments>
		<pubDate>Tue, 27 Oct 2009 18:28:20 +0000</pubDate>
		<dc:creator>Michael Mullany</dc:creator>
				<category><![CDATA[Engine Yard Cloud]]></category>
		<category><![CDATA[News]]></category>
		<category><![CDATA[Ruby]]></category>
		<category><![CDATA[Amazon]]></category>
		<category><![CDATA[Pricing]]></category>

		<guid isPermaLink="false">http://www.engineyard.com/blog/?p=2731</guid>
		<description><![CDATA[<p>As many of you have heard, Amazon announced pricing changes this morning. Customers have been asking if Engine Yard pricing will be adjusted accordingly, so before you call your rep, here's your answer!</p>
<p>Effective November 1st, we will be reducing the prices that we charge for instance hours in Engine Yard Cloud, passing through the announced price reduction in Amazon EC2 pricing. These prices will be in effect automatically for the November billing cycle. New and existing customers will not have to do anything to activate the new pricing, you will get the price reduction automatically starting with your November usage.</p>
<p>This table displays the current and new pricing for Engine Yard Cloud instance hours:</p>
<table style="border-collapse: collapse;" border="0" cellspacing="0" cellpadding="0" width="228">
<tbody>
<tr height="13">
<td width="78" height="13"></td>
<td style="text-align: right;" width="75"><span style="text-decoration: underline;">Current</span></td>
<td style="text-align: right;" width="75"><span style="text-decoration: underline;">Nov 1st<br />
</span></td>
</tr>
<tr height="13">
<td height="13">Small</td>
<td style="text-align: right;"><span> </span>$0.160</td>
<td style="text-align: right;"><span> </span>0.145</td>
</tr>
<tr height="13">
<td height="13">Large</td>
<td style="text-align: right;"><span> </span>0.480</td>
<td style="text-align: right;"><span> </span>0.420</td>
</tr>
<tr height="13">
<td height="13">XL</td>
<td style="text-align: right;"><span> </span>0.900</td>
<td style="text-align: right;"><span> </span>0.780</td>
</tr>
<tr height="13">
<td height="13"></td>
<td></td>
<td></td>
</tr>
<tr height="13">
<td height="13">High CPU Med</td>
<td style="text-align: right;"><span> </span>0.270</td>
<td style="text-align: right;"><span> </span>0.240</td>
</tr>
<tr height="13">
<td height="13">High CPU XL</td>
<td style="text-align: right;"><span> </span>0.900</td>
<td style="text-align: right;"><span> </span>0.780</td>
</tr>
</tbody>
</table>
<p>Our main web-site and pricing pages will be updated on November 1st to reflect these new prices. As always, we're here to address any questions or comments, and look forward to hearing from you!
<p><a href="http://www.engineyard.com/blog"><img height="98" width="61" title="logo-engineyard" alt="" class="attachment-post-thumbnail wp-post-image" src="http://www.engineyard.com/blog/wp-content/uploads/logo-engineyard.png"/></a></p>
]]></description>
		<wfw:commentRss>http://www.engineyard.com/blog/2009/announcement-engine-yard-cloud-price-reduction/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>Rails in the Wild: 5 Client-Side Performance Observations</title>
		<link>http://www.engineyard.com/blog/2009/rails-in-the-wild-5-client-side-performance-observations/</link>
		<comments>http://www.engineyard.com/blog/2009/rails-in-the-wild-5-client-side-performance-observations/#comments</comments>
		<pubDate>Tue, 13 Oct 2009 17:30:30 +0000</pubDate>
		<dc:creator>Michael Mullany</dc:creator>
				<category><![CDATA[Ruby]]></category>
		<category><![CDATA[Technology]]></category>
		<category><![CDATA[CSS]]></category>
		<category><![CDATA[JavaScript]]></category>
		<category><![CDATA[Rails]]></category>
		<category><![CDATA[Rails Performance]]></category>
		<category><![CDATA[S3]]></category>

		<guid isPermaLink="false">http://www.engineyard.com/blog/?p=2618</guid>
		<description><![CDATA[<p>We're putting together the presentation materials for our five-city "Rails Performance in the Cloud" Roadshow at the end of the month (<a href="http://www.railsroadshow.com/location-boston.html">Boston</a>, <a href="http://www.railsroadshow.com/location-austin.html">Austin</a>, <a href="http://www.railsroadshow.com/location-seattle.html">Seattle</a>, <a href="http://www.railsroadshow.com/location-los-angeles.html">LA</a> and <a href="http://www.railsroadshow.com/location-chicago.html">Chicago</a>). We'll be presenting some findings on page load performance for a casual survey sample of 100 North American Rails web-sites. Since some of the findings are not what you'd expect, we thought we'd share some of these early findings with you (come to the <a href="http://www.railsroadshow.com">Roadshow</a> and be the first to hear the rest of the analysis!).</p>
<h2>1. It's easy to forget to compress your JavaScript and CSS</h2>
<p>It seems like people are pretty good about gzipping their HTML and images, but for whatever reason, a lot of people forget to tell <code>mod_deflate</code> to compress JavaScript and CSS files. JavaScript payloads are becoming a much bigger percentage of total downloads (even a majority of the payload for many sites), and they're blocking the rest of the page from loading.</p>
<h2>2. Watch out for slow third party services</h2>
<p>Some of the big outliers in page load performance are caused by poor response times from third party services. Services like Google Ads and Analytics, Doubleclick, and Facebook Connect can kill your performance if you loading them early, so almost all sites (sensibly) put them as late as possible in the page load. Response times of up to eight seconds from Google Analytics are not uncommon, so this can result in a big road-bump in your page load if it's in the wrong place. Google Analytics, in particular, uses document.write, so most people have to <a href="http://github.com/choonkeat/postload_google_ads">work around it</a>, but a significant minority of sites don't.</p>
<h2>3. Using multiple image hosts doesn't always mean higher performance</h2>
<p>There's nothing magical about using multiple image hosts. It <em>should</em> produce higher performance by allowing parallel downloads, but only if you've put the necessary resources to work. An interesting performance outlier in the survey was a site that had configured multiple image hosts, but response times from those hosts were multi-second -- probably a good sign that they were under-resourced.</p>
<h2>4. S3 is NOT a Webserver!</h2>
<p>Amazon S3 is a reliable, cheap storage service, but don't treat it like just another web-server. Response time in the wild was regularly between 0.5 and 1.5 seconds, so make sure that you're not serving performance sensitive content from it. And if you do, try to use pre-loading to hide latency. Unlike a regular web-server, S3 (still) does not gzip content, so you also need to use a pre-compression utility like Yahoo's Smush.It to reduce image sizes before you put them up there.</p>
<h2>5. Most performance variability is NOT attributable to page factors</h2>
<p>When we did the analysis, we found that less than half of total page-load time in our sample was attributable to front-end factors like the number of http requests made by the page, the size of the page payload and whether or not you were scaling images in HTML. A majority of performance variability (a little surprisingly) was attributable to stuff that had nothing to do with page construction (basically network and back-end factors).</p>
<p>To learn more about what we found about average page response time, page size targets and http chatter, as well as what the analysis said about front-end performance, <a href="http://www.railsroadshow.com/register-now.html">sign up for the Rails Roadshow</a>, coming to a city near you in two weeks time. We'll be presenting along with our partners New Relic, Soasta, Amazon, CVSDude and more.</p>
<p>See you there!
<p><a href="http://www.engineyard.com/blog"><img height="98" width="61" title="logo-engineyard" alt="" class="attachment-post-thumbnail wp-post-image" src="http://www.engineyard.com/blog/wp-content/uploads/logo-engineyard.png"/></a></p>
]]></description>
		<wfw:commentRss>http://www.engineyard.com/blog/2009/rails-in-the-wild-5-client-side-performance-observations/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>10 Years of Virtual Machine Performance (Semi) Demystified</title>
		<link>http://www.engineyard.com/blog/2009/10-years-of-virtual-machine-performance-semi-demystified/</link>
		<comments>http://www.engineyard.com/blog/2009/10-years-of-virtual-machine-performance-semi-demystified/#comments</comments>
		<pubDate>Mon, 05 Oct 2009 20:30:03 +0000</pubDate>
		<dc:creator>Michael Mullany</dc:creator>
				<category><![CDATA[Ruby]]></category>
		<category><![CDATA[Technology]]></category>
		<category><![CDATA[Virtualization]]></category>
		<category><![CDATA[VMWare]]></category>
		<category><![CDATA[Xen]]></category>

		<guid isPermaLink="false">http://www.engineyard.com/blog/?p=2374</guid>
		<description><![CDATA[<p>There are many opinions in the air about the impact that virtualization has on performance, so I thought a short blog would be good to explain (as best I can) virtual machine performance characteristics with pointers to relevant benchmarks and technical papers.</p>
<p>My background is that I was an early Product Manager working on VMware ESX Server (from version 1.5) and among other things ran product management for VMware for a few years. As a product management guy, I kept track of the output of the engineering performance group, and as a result had a reasonable high level (although never code level) understanding of the whys and wherefores of virtualization performance. Although I'm not as fresh on virtualization as I once was, I'll try to do my best here. I also want to thank Steve Herrod at VMware, and Simon Crosby at Citrix for providing a technical sanity check on the blog contents, although I retain responsibility for any mistakes and oversights.</p>
<p>First, a solid statement: virtualization has always levied a CPU "tax." Early on, this was very high, recently not so much. Probably the most comprehensive recent non-vendor benchmark of performance vs. native is AnandTech's, which recently showed <a href="http://www.anandtech.com/showdoc.aspx?i=3567&amp;p=10">anywhere from a 2% to a 7% CPU tax</a> on a fully loaded system running mixed workload 4-CPU virtual machines on recent hardware.</p>
<p>The virtualization tax has always varied a lot with the type of workload, number of virtual machines, number of virtual CPU's per machine and your hypervisor type. The reason people have been willing to pay the tax is that virtualization is just a better way to manage systems: system utilization is higher because you can pack workloads together while still maintaining hardware-guaranteed security isolation; hardware upgrades are trivial because the guest OS's always run on a consistent virtual hardware layer; image management is trivial; and neat tricks like shared copy-on-write memory means that you can actually use fewer resources in a virtualized environment. Best of all, you get a consistent container for managing your workload, no matter what you end up having to put in it.</p>
<p>However, many people still look at the Googles and Yahoo's of the world who designed their architecture when virtualization tax was high and say "Google doesn't believe in virtualization, so maybe I shouldn't". So, let's dive into the issue of the virtualization "tax."</p>
<p>There are really two aspects of performance that you have to consider when you look at virtualization. The first is, for a given workload, what level of work do you get done in a virtualized environment vs. a native environment. The second is, for a given level of work done, how much flexibility do you lose.</p>
<p>Take the example of a highly i/o bound workload. Let's say a native environment gets 10M disk i/o's performed per time period, and a virtual environment gets 9.8M disk i/o's performed. Should you consider that a 2% overhead? Yes. But what if I was to tell you that under the native environment, CPU utilization was 20%, while under the virtual environment, the utilization was 30%. Should you consider that a 50% overhead? What's the right number, 2% or 50%.</p>
<p>The rule here is that you always look at your limiting factor. If you're burning more of the non-limiting factor by being virtualized, then you don't really care—it wasn't being used anyway. So 2% is the right number. But there's a caveat: what if your workload changes so that you have a CPU intensive workload in another thread? Should you care that 10% of your CPU time is being burned by virtualization?</p>
<p>The history of modern virtualization is a history of engineers eating an elephant. Taking each bottleneck to performance in turn and tackling it. Knowing the things they can change, and the things they can't, and having the wisdom to know the difference. Over time, as virtualization-friendly features have spread to every part of the IT stack, the most insuperable barriers to virtualization performance—oddities in the Intel architecture, OS limitations, uncooperative NIC's—have been addressed one by one, until finally this year (yes, just this year), the last serious performance barriers to virtualization have been finally addressed.</p>
<p>But first, the history. Before the dawn of modern virtualization, there were lots of emulators out there that emulated one operating system on top of another. But because every OS call had to be emulated in software, they were slow. More importantly, if they were running on top of Windows, they were dependent on Microsoft Windows not changing its behavior from patch to patch—which of course was a terrible bet. But virtualization was different—most instructions didn't have to be emulated—if they weren't accessing memory or an i/o device (or were one of a handful of badly behaved instructions), they could simply be passed down directly to the CPU—drastically increasing performance, but also critically, bypassing the need for dependence on the host operating system's API.</p>
<h2>In the Beginning Was Disco</h2>
<p>The seminal project inaugurating this generation of x86 virtual machines was the Disco project at Stanford, which published <a href="http://www.cl.cam.ac.uk/~smh22/docs/disco_sosp.pdf">its key paper in 1997</a>. That project (three of the four authors were future founders of VMware) built a virtual machine monitor for the Irix operating system running on the FLASH research processor.</p>
<p>The performance characteristics were reasonable for the systems of the day. 3% to 36% overheads for a single VM on memory/CPU intensive tasks.  But the really interesting thing about the paper was that total system output with eight VM's on an 8CPU system almost doubled vs. native, because on native hardware, Irix was not very effective at scheduling work across 8 cores.</p>
<h2>The Stone Age: VMware Workstation</h2>
<p>VMware was founded in 1998 and in 1999 it released VMware Workstation 1.0. As a desktop product, it ran on Windows and Linux and allowed people to run other operating systems on top of either. By this stage, VMware engineering had tuned the core virtual machine monitor so that memory and CPU intensive workloads were pretty fast compared to native with some exceptions. On the other hand, networking-intensive workloads had fairly terrible performance.</p>
<p>The reasons for the overheads were outlined by some of the VMware engineering team in a 2001 <a href="http://www.usenix.org/events/usenix01/sugerman/sugerman.pdf">Usenix paper</a>. The paper gamely showed that with several optimizations it was possible to get full native throughput for networking workloads (10/100 BaseT), although the amount of CPU work spent to process that workload was about 4x the work required in the native environment. The paper also pointed out several possible further optimizations.</p>
<h2>The Bronze Age: Hypervisors</h2>
<p>One of the optimizations suggested was a custom kernel that would cut the amount of interrupt handling (a major cause of CPU overhead) in half by bypassing a host operating system.  The ESX Server project was already in full swing by that stage, and when the product came out, it had two big innovations—the vmkernel, a kernel built from scratch to run guest OS's, and VMFS, a highly simplified extent-based file system for fast disk access.</p>
<p>The benchmarks for ESX Server were a huge improvement over the host-based Workstation. ESX Server 1.0 could basically process a 10/100 networking workload with about a 10-20% CPU burn. One of the things working in virtualization's advantage has always been that Moore's Law would give it more CPU cycles every year, so the fixed overhead of processing a particular workload decreased proportionally over time. However, as customers shifted to GigE networking during 2003, benchmarks vs. native took a nose-dive. On the server hardware of the time, GigE workloads were being CPU-limited at about 300 MB/s for an average packet size. Basically, you could saturate your CPU just processing network traffic. (To be fair, on the hardware of the time the CPU burn was also very high on native hardware.)</p>
<h2>The Iron Age: Paravirtualization and Virtual SMP</h2>
<p>The next jump in performance was the introduction of paravirtualization by the Xen open source team and multi-CPU virtual machines by VMware. The Xen team patched Linux to get rid of some of the more problematic instructions to virtualize. The first Xen software also included a high performance networking system (but I believe that system was later abandoned due to other issues -- although hopefully someone with better Xen knowledge could chip in with more details).</p>
<p>Meanwhile, VMware was introducing the first multi-CPU guest virtual machines. This was a long performance optimization task. In early stages of development in 2002, Virtual SMP achieved about 5% of native performance, but over the course of 18 months of steady performance optimization, it got to about 75% of native, and it shipped at around that performance level.  Around the same time, (about early 2004), Intel shipped the first generation of VT technology, slightly ahead of AMD's equivalent. Ironically, this initially decreased VMware performance on some workloads, and VT did not enjoy a lot of adoption. A great backgrounder on the impact of VT technology is Ole Agesen's<a href="http://communities.vmware.com/servlet/JiveServlet/download/1147092-17964/PS%5FTA68%5F288534%5F166-1%5FFIN%5Fv5.pdf;jsessionid=3BF14921261F4C67AD33ADEF05C09420"> primer from VMworld 2007</a>.</p>
<h2>The Silicon Age: Virtual I/O</h2>
<p>Since 2005, VMware and Xen have gradually reduced the performance overheads of virtualization, aided by the Moore's law doubling in transistor count, which inexorably shrinks overheads over time.  AMD's Rapid Virtualization Indexing (RVI - 2007) and Intel's Extended Page Tables (EPT - 2009) <a href="http://www.vmware.com/pdf/Perf_ESX_Intel-EPT-eval.pdf">substantially improved performance for a class of recalcitrant workloads</a> by offloading the mapping of machine-level pages to Guest OS "physical" memory pages, from software to silicon. In the case of operations that stress the MMU—like an Apache compile with lots of short lived processes and intensive memory access—performance doubled with RVI/EPT. (Xen showed similar challenges prior to RVI/EPT on <a href="http://www.xen.org/files/xensummitboston08/Deshane-XenSummit08-Slides.pdf">compilation benchmarks</a>.)</p>
<p>Some of the other performance advances have included interrupt coalescing, IPv6 TCP segmentation offloading and NAPI support in the new VMware vmxnet3 driver. However, the last year has also seen two big advances: direct device mapping, enabled by this generation of CPU's (e.g. <a href="http://www.intel.com/technology/itj/2006/v10i3/2-io/1-abstract.htm">Intel VT-D</a> first described back in 2006), and the first generation of i/o adapters that are truly virtualization-aware.</p>
<p>Before Intel VT-D, 10GigE workloads became CPU-limited out at around 3.5Gb/s of throughput. Afterwards (and with appropriate support in the hypervisor), <a href="http://blogs.vmware.com/vmtn/2008/08/netqueue-vmdire.html">throughputs above 9.6 Gb/s have been achieved</a>. More important, however, is the next generation of i/o adapters that actually spin up mini-virtual NIC's in hardware and connect them directly into virtual machines—eliminating the need to copy networking packets around. This is one of the gems in <a href="http://www.cisco.com/en/US/solutions/collateral/ns340/ns517/ns224/ns892/ns894/white_paper_c11-525307_ps9902_Products_White_Paper.html">Cisco's UCS hardware</a> which tightly couples a new NIC design with matching switch hardware. We're now at the stage that if you're using this year's VMware or Xen technologies, Intel Nehalems and Shanghai Opterons and the new i/o adapters -- virtualization has most performance issues pretty much beat.</p>
<h2>Common Attribution Problems</h2>
<p>So why then do people attribute chronic performance problems to virtualization? Well sometimes they're comparing apples and oranges, new hardware to old. And sometimes they're not comparing limiting factors. A sysadmin will sometimes pack virtual machines on a machine until CPU utilization hits 75%, without realizing that he's run out of i/o capacity way before that.  And sometimes it's true. Running hundreds of multi-CPU VM's on a single machine still probably wastes a lot of CPU cycles—but in that case, the alternative of putting all those Guest Operating Systems on separate servers is probably a very expensive idea. And I have to imagine (without evidence, but just looking at trends) that performance overheads for 8+ vCPU virtual machines are still not all that great. But in most cases, the tax seems to be worth it.
<p><a href="http://www.engineyard.com/blog"><img height="98" width="61" title="logo-engineyard" alt="" class="attachment-post-thumbnail wp-post-image" src="http://www.engineyard.com/blog/wp-content/uploads/logo-engineyard.png"/></a></p>
]]></description>
		<wfw:commentRss>http://www.engineyard.com/blog/2009/10-years-of-virtual-machine-performance-semi-demystified/feed/</wfw:commentRss>
		<slash:comments>20</slash:comments>
		</item>
		<item>
		<title>Security Vulnerability in Nginx: Patch &amp; Upgrades Available</title>
		<link>http://www.engineyard.com/blog/2009/security-vulnerability-in-nginx-patch-upgrades-available/</link>
		<comments>http://www.engineyard.com/blog/2009/security-vulnerability-in-nginx-patch-upgrades-available/#comments</comments>
		<pubDate>Mon, 14 Sep 2009 19:40:43 +0000</pubDate>
		<dc:creator>Michael Mullany</dc:creator>
				<category><![CDATA[News]]></category>
		<category><![CDATA[Ruby]]></category>
		<category><![CDATA[Nginx]]></category>
		<category><![CDATA[Security]]></category>

		<guid isPermaLink="false">http://www.engineyard.com/blog/?p=2323</guid>
		<description><![CDATA[<p>Today, nginx released <a href="http://www.nginx.net/">new versions </a>(0.6.39, 0.7.62, 0.8.15) and a patch to fix a remote execution security vulnerability in all versions of nginx.  Attackers exploiting this vulnerability can execute arbitrary code within the rights of the nginx worker process or cause a denial of service by repeatedly crashing the process.</p>
<p><span style="font-family: Verdana,Arial; font-size: x-small;"> </span></p>
<p>All instances created in the last week on <a href="http://www.engineyard.com/products/cloud">Engine Yard Cloud</a> already include a patch for this vulnerability. Older instances can apply this fix by simply performing a redeploy. Engine Yard customers have been contacted by email and private cloud customers should coordinate with support to schedule an appropriate maintenance window for upgrade.
<p><a href="http://www.engineyard.com/blog"><img height="98" width="61" title="logo-engineyard" alt="" class="attachment-post-thumbnail wp-post-image" src="http://www.engineyard.com/blog/wp-content/uploads/logo-engineyard.png"/></a></p>
]]></description>
		<wfw:commentRss>http://www.engineyard.com/blog/2009/security-vulnerability-in-nginx-patch-upgrades-available/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
	</channel>
</rss>

