“Why does everybody say that CPUs are fast nowadays and that ‘it doesn’t matter that language XYZ is slow’?
It does matter: web applications. If your applications can’t serve all the visitors, then you’re going to lose your customer or you’ll have to learn some other language with better performance.
Once our application serves 200 million page views each day… the languange is really sensitive, so we go with C/C++.”
—ruby-talk Thread
Performance: it’s a topic that comes up over and over again in the Ruby world, and everyone’s got an opinion. Unfortunately, those opinions often focus on minutia, and tend to miss the big picture.
On top of that, things in the Ruby world are far more complex, today, when discussing performance, because one really has to talk about Ruby performance in the context of a specific implementation. Are we talking about Matz Ruby 1.8.x, or 1.9.x? Are we talking about Rubinius or JRuby? What about MacRuby? IronRuby? MagLev? Every one of these has a different performance profile and level of completeness.
For the purposes of this post, and for the purposes of the attention I paid to the two quotes above, I’m going to focus on Matz’ Ruby 1.8.x (MRI). It’s been the Ruby for many years, and it’s what most people are pointing at when they complain about Ruby being slow. Don’t just take my word for it though—check out The Computer Language Benchmarks Game for a substantial set of flawed micro-benchmarks using a plethora of different languages. What they call “Ruby MRI” is, at this time, ruby 1.8.7 (2009-06-12 patchlevel 174). It’s not even close to being the most recent version of 1.8.7, but that’s OK. The benchmarks there have to be taken with a couple grains of salt, anyway.
Here’s why: Micro-benchmarks for languages have only a weak relationship to the performance of complex systems implemented in those languages, even when implemented well. Or, to put it another way, the speed at which a language can complete a simple, discrete task, is not necessarily a strong predictor of how fast a complicated application, composed of many tasks, will perform when implemented in that language. There are other factors which come into play that can strongly influence overall performance; factors like application architecture, and the ability to leverage higher-level built in capabilities, that simplify things which may be complex to implement in other languages.
Many of you probably know people who claim Ruby can’t scale, or is too slow for business-critical web applications. Since you’re reading this, you also know those people are wrong. In fact, it’s usually far easier to scale a Rails application’s web-facing aspect than it is to scale the data storage parts of the application. Nonetheless, scaling that web-facing aspect has costs, and if your application can return content to your customers more efficiently, reducing your hardware needs, you reduce your costs.
Returning to the ruby-talk thread that those quotes came from, my response included an assertion that I thought I could spin up a single Engine Yard Cloud instance, and that running it with an all Ruby stack, I could push 200,000,000 requests through it in less than a day. When I say an all Ruby stack, I’m not talking about the database layer, but rather, the application and anything above it (such as the web server). I wouldn’t use Apache, nginx, or any other non-Ruby web server, and I’d use a real, complex application.
Since I already had a 64bit, 4ECU instance running that I use for testing Ruby 1.8.6 changes, I just used that existing instance. I used Ruby 1.8.6 pl287 for this. I could’ve used use any version, as RVM makes it simple to pick and choose, but that I selected that one because many sites have run on it for a long time (though if you are running on it now, you really should upgrade), and by being a less than current version, it serves my point well.
For generating test traffic, I used the venerable Apache Bench. Even after all these years it’s still got some buggy corner cases, but it’s straightforward and easy to use, and it’s own performance is high enough that it takes some pretty fast test subjects before you start running into the performance limitations of the tool, instead of the test subjects. I ran it on the same machine as my application’s stack because I wanted to eliminate the network as a factor in results, and just feed as many requests to my stack as quickly as possible.
The test application was Redmine, version 0.8.7. I selected Redmine because it’s a complex application familiar to many people, and it’s easy to install. It’s also not yet optimized for speed. Development has been far more focused on features and function than on optimizing for resource usage efficiency. The Rails version that I used is 2.3.2.
So, after installing and configuring Redmine, I started it:
ruby script/server -e production -d
Note that I did not use Mongrel, evented_mongrel, Thin, or anything else sophisticated as the container for the application. It was just webrick, and it was just a single instance of webrick.
I then threw some random data into it just so that there was something other than the empty pages. So, let’s see how it performed!
ab -n 10000 -c 1 http://127.0.0.1:3000/
Hmmm. I rode my exercise bike 1.3 miles while that ran… That didn’t feel fast at all.
Requests per second: 33.98 [#/sec] (mean) Time per request: 29.432 [ms] (mean) Time per request: 29.432 [ms] (mean, across all concurrent requests)
OK. I mean, that’s not horrible. Redmine isn’t a lightweight app, and that’s over 2.5 million requests a day on a single process. What happens if there’s some concurrency?
ab -n 10000 -c 25 http://127.0.0.1:3000/ Requests per second: 31.11 [#/sec] (mean) Time per request: 803.707 [ms] (mean) Time per request: 32.148 [ms] (mean, across all concurrent requests)
That was a 1.4 mile benchmark ride. Shoot; does that mean Ruby really is slow? That did not go in the direction we need, and let’s be real here: in a real application deployment, there are going to be concurrent requests—many of them, if you’re at all successful. It’s pretty clear what direction everything was moving in, but I wanted to take it one step further.
ab -n 10000 -c 500 http://127.0.0.1:3000/ Benchmarking 127.0.0.1 (be patient) Completed 1000 requests Completed 2000 requests Completed 3000 requests Completed 4000 requests Completed 5000 requests Completed 6000 requests apr_socket_recv: Connection reset by peer (104)
Well, good to know. Clearly, Redmine running inside of webrick can scale, but there are limits that aren’t too hard to hit on a single process. If we were spreading these requests over multiple processes on multiple instances, we could reasonably scale to many millions of requests per day, even running our code on webrick, assuming that the database layer could keep up with all of that. However, that’s still a long way from two hundred million requests per day.
Even if we were running on a Ruby implementation that was 2x as fast, or 5x as fast, and even if the application were running in a faster container, the basic problem is still the same—we’d have to throw hardware at it until the problem went away. Even if you spent a lot of time laboriously building Redmine in C++ while focusing on performance, you still wouldn’t escape the need, with this simple architecture, to throw hardware at the problem. So, what do you do if you need more throughput out of your application, but aren’t excited about adding more hardware resources?
Consider these runs:
ab -n 10000 -c 1 -C '_redmine_session=9ec759408f1ae3c6f919e50baba5a3dc; path=/' http://127.0.0.1/ Requests per second: 2839.37 [#/sec] (mean) Time per request: 0.352 [ms] (mean) Time per request: 0.352 [ms] (mean, across all concurrent requests)
ab -n 10000 -c 1000 -C '_redmine_session=9ec759408f1ae3c6f919e50baba5a3dc; path=/' http://127.0.0.1/ Requests per second: 3862.33 [#/sec] (mean) Time per request: 258.911 [ms] (mean) Time per request: 0.259 [ms] (mean, across all concurrent requests)
ab -n 100000 -c 25 -k -C '_redmine_session=9ec759408f1ae3c6f919e50baba5a3dc; path=/' http://127.0.0.1/ Requests per second: 7797.39 [#/sec] (mean) Time per request: 3.206 [ms] (mean) Time per request: 0.128 [ms] (mean, across all concurrent requests)
I barely had time to turn the cranks on the exercise bike for those runs! It turns out that to get that performance, I needed to look at my architecture and rethink how I was positioning my application’s web facing aspect. Most applications, even highly dynamic ones, show lots of the same stuff to the users. In many cases completely identical content is being displayed for many different users. It’s senseless to regenerate this content over and over again. This is where caching enters the architecture picture.
Rails 2 has some built in support for caching. It’ll do page caching, which basically writes a static copy of a dynamically generated page to a persistent location, so that on subsequent hits the web server can deliver the page. This works great, but it has limitations.
All content, for everyone, for a given URL must be identical, and you’re responsible for providing a sweeper that clears old content. Also, requests will still fall down to your web server, which may mean that you still encounter some significant performance penalties when delivering your content in some situations. For example, nginx delivers static files quite quickly if it’s sitting on top of a fast disk. Sit it on a slow disk, though, and page caching returns limited dividends. If it can work for your application though, use it.
Rails also supports partial caching in some different guises—to the file system, to memory, to memcached, etc. Partial caching can be a win architecturally, because it bypasses all of the heavy work involved in generating content; your app can just assemble pregenerated fragments into a complete page. If you haven’t done so, look into that as well. It can be very helpful.
Along those same conceptual lines, there’s also edge side includes, or ESI. ESI essentially lets one’s application return a skeleton of a page, or an incomplete page with some special markup embedded. The proxy that receives that content, and that understands ESI markup can then insert content, either from its own cache, or from a subrequest that it issues to some other URL.
This lets a proxy cache a generated, but incomplete page, yet still fill it out with smaller pieces of dynamically generated content without pushing all of that work back into the dynamic application. So it’s a bit like partial caching, but it’s handled at a shallower level in the stack. I’ve heard that Rails 3 will have a plugin to facilitate the use of ESI, and that it may come built in with a later dot release. Not all reverse caching proxies support ESI, but many of them do.
For Redmine, page caching doesn’t work very well. It, like many applications, uses cookies. Applications can use cookies to identify users, to handle authentication, or to persist data on the user’s browser, instead of on the server. When an application needs to deliver cookies in addition to content, simple page caching won’t work. Redmine falls into this category. And besides… I promised to use a Ruby stack, so leveraging Nginx or Apache to serve files from a page cache would be cheating.
What I really needed was a caching reverse proxy that would sit in front of the application. It had to be smart enough to do the right thing with regard to caching content that has cookies attached (at least for some definition of the right thing), and it had to be stubborn enough to not-quite follow the Cache-Control headers that Redmine set. It needed to be implemented in Ruby, and it be fast enough to be worthwhile.
Most caching reverse proxies are implemented in fast languages. Varnish, one of the fastest caching reverse proxies, is written in C. Nginx , which can be configured to provide a caching reverse proxy, is also implemented with C, as is Squid, one of the oldest proxy servers. Traffic Server is implemented with C++.
Refer back to the benchmarks site. C is a lot faster than MRI Ruby. C++ is significantly faster, too. So, to borrow a phrase from my grandmother, how on God’s green Earth do I expect to write a proxy in Ruby that can compete with one in a language that benchmarks 100x-200x faster than it is?
Bullheaded stubborness in the face of ignorance? Well, yes, a little bit, combined with some specific architectural decisions. Most of those proxies try to do everything. I think there are probably configuration options in Squid that would get it to cook breakfast for me. Traffic Server probably won’t cook breakfast for me, yet, but it will make the bed, and somewhere in the TODO, I’m sure they have plans to allow for it to make breakfast, too, if you can figure out how to configure it. Varnish is one of the fastest proxies, and it gets its speed, in large part, because it won’t make the bed or cook my breakfast. It’s like Charles Emerson Winchester III from M.A.S.H., “I do one thing at a time, I do it very well, and then I move on.” Varnish does still take some configuration eduction to get it to work well, though.
And that is the secret to keeping things fast. Or, at least one of the secrets, anyway. I took it one step further. My approach was:
Do one thing at a time, do it well enough, and then move on.
A couple of years ago I wrote a very fast proxy and simple web server in Ruby that I called Swiftiply. It leverages EventMachine for handling network traffic, and then tries to squeeze the rest of the performance that it needs out of Ruby by not providing any more capability than is really needed to get the job done. Someone once said that “No code is faster than no code.”
Swiftiply didn’t provide enough capability for a caching reverse proxy, but it did have the capability to serve and cache static assets very quickly (on a lot of hardware my benchmarking efforts have run up against Apache Bench’s own performance limits), and it did already function as a proxy, so much of the capability was there. One advantage to it being written in Ruby was that it was relatively straightforward for me to add additional capability to it. So I did.
To really handle Redmine properly requires the ability to cache different versions of the same URL, where the only differentiator is the cookies. Also, Redmine sets a Cache-Control header that looks like this:
Cache-Control: private, max-age=0, must-revalidate
Without digging into it deeply, this means that public caches should not cache the content, and private caches need to confirm with the server that it has valid content before using it. But we want to ignore that (unless Cache-Control is set to no-cache, in which case we’ll pay attention), because we do want to keep private content cached, and we do not want to have to always go back to the application to revalidate on every request. My assumption is that it is OK if, for example, a new issue is added, but it takes a few seconds before a url which shows the issues is refreshed to display that new issue.
The end result is a caching reverse proxy with very few tuning knobs, and behavior that’s not quite HTTP 1.1 correct, but that is very fast, stable, and hackable. It’s probably not actually as fast as it could be, since I piggy backed the implementation onto something that’s doing more than I really need, but it’s good enough. Ruby, as a “slow” language, delivers on something that runs very fast and is good enough for the goal that I had.
If you’re wondering how many requests were pushed through my Ruby stack in 24 hours:
Requests per second: 3283.09 [#/sec] (mean)
That’s 283,659,084 requests in 24 hours (and none of them were keepalive requests). All handled in a Ruby stack. All with a completely browseable and useable Redmine installation that was still responsive while the test was running; I added issues, edited them, removed them, and did administrative actions with no perceptible delays.
I readily admit that this isn’t a test that faithfully simulates real production loads; you probably aren’t going to roll out a production web app servicing two or three hundred million requests a day on a single modestly sized EY Cloud instance. But if you were doing something that wasn’t going to be bottlenecked by the data store, you just might be able to do it, all with slow, slow Ruby. Not bad.
It’s no Varnish, and it never will be. Varnish does far more, more correctly, and all a little bit faster. Varnish also requires some careful tuning to run well, and is not nearly so hackable— so there are tradeoffs. If you neede more performance out of your application, look closely at what a caching reverse proxy can do for you. In the larger view of your application’s deployment architecture, it can make a tremendous difference in your users’ experience. Varnish is a great piece of software, and deserves a post of its own covering configuration and usage.
And if you truly find that you need some specialized capability, don’t be afraid to spike something out with Ruby. Paying a little attention to writing lean code that delivers just the capabilities that you need can result in surprisingly fast, capable code, even in a slow implementation of a slow language like Ruby ;)
Questions and comments welcome!


Watch a Live Demo of Engine Yard AppCloud
The Engine Yard Newsletter
Thanks, really nice.
Great article, but isn't using EventMachine cheating as it breaks your self-imposed rule of using a pure Ruby stack? EM relies on C code compiled as part of its gem install.
http://github.com/eventmachine/eventmachine/tree/...
;-)
Your point is well taken though. With proper thought, and without artificial constraints on purity, a Ruby application can scale up or out. No production applications run on a pure Ruby stack, but that doesn't mean the application code can't be pure Ruby.
I thought about that, but Ruby uses C, too. A lot of it. Array is C. Hash is C. Numeric. Bignum. Regexp. String. And on and on. Ruby doesn't run on pure Ruby. Even Rubinius has a small crunch nugget in the center that's not Ruby. So it is a continuum, really.
Err, ruby doesn't run on pure ruby because it would be very slow. Also it would not be possible to do any syscalls as ruby <-> C is designed specifically for that.
Many times I see people do things like,
obj.func1.func2
and say it is very fast while both functions do everything in C. That does not mean ruby is fast. A simple synthetic benchmark measures how much overhead there is in the interpreter/VM. For example,
for( i=0;i<1000000; i=i+1)
array[i] = i;
and ruby chokes. It is similar with other interpreted languages (including python), though ruby tends to be slowest. Now, C/Java/J#(mono) are all reasonably comparable execution time (within a magnitude).
Sure, you can throw hardware on it, but if you need 120 machines to run a service in Ruby and 5 to run a service in Java/C++/.NET, it's a nobrainer.
You didn't read what I wrote, did you?
A for loop tells you nothing about the behavior of a complex system.
Rubinius IS actually a Ruby implementation that depends much more heavy on Ruby for the implementation of core capabilityes, and on a large number of synthetic benchmarks, it is _quite_ fast. I don't have a link to the latest set of numbers handy, but on your little for loop, once the interpreter is warmed up, runs 3.4x faster on Rubinius than on MRI 1.8.[67]. Check it out. Not that it's really at all relevant to my point.
Actually, I did read what you wrote. I read the entire article too. But it doesn't mean ruby is not slow. Sometimes slow doesn't matter, sometimes it matters. There are problems where using Ruby (or Python, PHP, etc) makes the solution unworkable. These solutions tend to be CPU/memory intensive.
I totally agree with you here. Just because Ruby isn't slow enough to be the bottleneck in all cases doesn't mean it's not slow. It is. All other arguments to the contrary are silly.
Sry but your argument disproves your point. Rubinius looks for opportunities to generate optimised Assembler code, not Ruby code. So basically it's a automagic C extension replacement on steroids. If you change the Ruby code at runtime you will hit performance penalties until optimised code is generated again.
I thought, what you want to say is, performance is not just defined by the actually speed of the implementation (what in the end is always about fast mashine code, because mashines do execute them) but rather the overall design. This is essential a no brainer. No one will seriously dispute this. If for ex. caching would only be possible using Ruby, it would be far a head all other competitors. Well it's not, but the fact you can cache, means the fact Ruby is slow is not that important, but doesn't prove it's wrong.
The real benefit of using Ruby comes to light in one of your side notes. A existing code base can be really fast customized, reused and combined to something new. Something hard to measure, but if you try to reuse C code this is evident.
Regarding Rubinius, I'd say that's irrelevant, but it's a pretty minor argument. Array, in Rubinius, is implemented in Ruby. The fact that the runtime optimizes the snot out of it is no different than the Java runtime optimizing the snot out of Java, or a C compiler that generates optimized assembler.
As for the rest…. Yep, pretty much. :)
Talking about rubinius is sort of like talking about Perl 6. Neither's viability is worth arguing until they're a completed product. Otherwise, no one is going to consider using them for production applications, so it simply doesn't matter.
Rubinius is at 1.0rc3. RubySpec compliance isn't quite 100%, but it's getting closer every day, and most Ruby code does work against it. Most existing MRI extensions will build and work against it, as well. There is only one blocker that I am aware of which prevents it from running Rails 3, and that is bound to be resolved soon. Rubinius is making great strides, and a 1.0 release is not far off anymore.
Sorry for the double reply, but I forgot to mention, too, that I know of dozens of production sites/apps running a Ruby-only stack in production which is why I felt pretty confident going down the road that I did when I conceived of this article.
Very nice article. Too many people throw the "Ruby is SLOW!" excuse at us Ruby programmers. Good to see a practical and a sensible benchmark.
ruby is slow.
Very nice article :)
What instance size did you used to perform your test?
–
Mayank
An m1.large instance:
7.5 GB memory
4 EC2 Compute Units (2 virtual cores with 2 EC2 Compute Units each)
64-bit
Ok..
> It’s not even close to being the most recent version of 1.8.7
It's the latest 1.8.7.174-1ubuntu1 package provided for Ubuntu 9.10
(Building Ruby 1.9 from source is quite enough special attention for Ruby.)
> Micro-benchmarks for languages have only a weak relationship to the performance of complex systems implemented in those languages, even when implemented well.
Who would have guessed!
http://shootout.alioth.debian.org/flawed-benchmar...
The fact that you think 200,000 requests a -day- is impressive says everything that needs to be said about the piss poor performance of ruby.
Yeah, reading is only for those who passed elementary school. There's 3 zeros more and the mention of "A single instance". And that's the second point you fail at: to scale means "I can throw more hardware at it and it becomes faster quickly" (the more linear it becomes faster the better it scales).
But hey, that's too much reality for you I guess.
wow! then by your definition, a program written in BASIC can scale too.
That's right, it sure can.
typical rails fanboy… redefine all terms untli they are meaningless in order to talk around the glaring deficiencies of the framework.
"Don't argue with idiots. They'll drag you down to their level and beat you with experience."
reality is that reality is a fucking idiot in reality! go scale ureself with BASIC intuition ;)
The HTTP spec allows for extensions to Cache-Control, so it would be possible (or even smart) to remain conformant and add a couple of extension options that allow for the things you're talking about that aren't quite conformant. Like "private, max-age=0, app-max-age=2" to indicate that while external caches should not cache it at all, the private cache can cache for 2s. Or "private, max-age-0, app-shared-by=Cookie, app-max-age-shared=2" to indicate that the private cache can cache responses that share the same cookie headers for 2s.
Yeah. There are certainly better ways to do it than I did. No argument at all. I just wanted something that would effectively work for Redmine without me having to hack Redmine to deliver more useful cache control directives.
93% of my actions in my rails controllers in my rails apps have a for loop that runs 1 million times and writes to an array. Damnit, I better rethink my choice for an application language.
I'm pretty sure you don't believe me. And you'd be correct. My apps don't do retarded things that 'synthetic benchmarks' are testing against. I'll have my app to market 8 months before yours, and it'll be a little bit slower (at 3000 req/ps) than yours. But hey, I'll have an 8 month head start, with 1/5th the personnel, which should give me a healthy financial advantage to tune where necessary. It's the simple case of don't optimize early. You'll have plenty of time (and money) later when performance really enters the picture.
3000+ requests per second on a single instance with ruby is perfectly fine with me. Thanks for the article Kirk
You know, there was a time when I would have agreed with you. However, experience has proven (at least to me) that Rails and Ruby productivity breaks down quickly as the project's complexity increases. The abuse of named args via hashes and open classes just really makes life miserable. You start to miss the very solid code navigation that a statically typed language gives you.
Your argument provided above was valid before frameworks like Play! (http://www.playframework.org) existed for Java. Play gives me everything, and I mean everything, in terms of productivity that Rails did, *plus* it gives me the blindingly (compared to ruby) speed of a Java stack. If you're doing rails but frustrated by speed and language, give play! a look. It really includes almost every feature Rails does, which I previously thought would be impossible from a Java framework. Also, it has Scala support, so you can still have nice closures, etc, just like in ruby, but faster.
"7.5 GB memory
4 EC2 Compute Units (2 virtual cores with 2 EC2 Compute Units each)
64-bit "
The the question becomes how does Ruby compare with other languages *on the same hardware*? When people say Ruby is slow they mean it is slow compared to something else(say Java). Ignoring comparison makes the argument weak.
You missed my point. I intentionally used a slow implementation of the language, with a slow application, running in a slow application container, using a single process/thread. It benchmarked at less than 34 requests a second. I conceded the slowness of that benchmark. In fact, I took efforts to make it a slow benchmark scenario. I was disappointed, in fact, that Redmine wasn't slower.
Most applications are heavy on the read operations relative to the write operations, and most of them have an uneven distribution of pages which are viewed. That is, people look at stuff a lot more than they change stuff, and they look at some stuff more than other stuff. That situation is ideal for optimizing through some sort of caching. And the cool thing about caching is that you don't have to cache anything for very long to make a tremendous difference in one's throughput. In my benchmarks any cached content was thrown away after less than a minute, forcing a refresh from the application. I could have cut that window down substantially and I still could have exceeded my challenge goal of pushing two hundred million requests through the stack in a day.
Requests per second: 2575.95 [#/sec] (mean)
That's from a set of a few hundred thousand requests done with a cache expiration window of two seconds. So, even for data that changes extremely frequently, fast caching is still a tremendous architectural win once a site starts seeing a lot of hits. And just for fun, if I do the same test while allowing keepalive requests:
Requests per second: 5955.06 [#/sec] (mean)
And that leads me to the second point of my article. I was using a version of a _Ruby_ caching proxy server to demonstrate something that was getting the job done at speeds comparable to or in excess of what I'd get if I had configured another caching reverse proxy to behave similarly. And I did it using a relatively SLOW implementation of the language. If I'd implemented everything in Java or C, would it have been faster? I'll concede that it would be. It probably would not be a lot faster, though, and I'd still be implementing it. Tradeoffs. But, clearly, one can get some pretty impressive performance from even a slow implementation of the language. I've been doing Ruby for a long time, professionally, delivering web apps and dynamic web sites, in Ruby, to businesses since well before Rails was a gleam in David Heinemeier Hansson's eye. People have whined about Ruby performance for a long time. Sometimes those whines are well informed, legitimate whines. Quite often they miss the forest for the trees, though.
Java IS faster (though it seems like the preponderance of sites implemented with Java are brutally slow, which is an interesting discontinuity). People use Java for a lot of very valid reasons, and the cool thing is that people using Java for all of those valid reasons can still use Ruby, via JRuby. The margin of difference in actual performance from a version of Redmine implemented in Java to a comparable point as the Ruby one is unlikely to be anywhere close to the difference in performance between a Java for-loop and a the Ruby equivalent, though, and it wouldn't matter, anyway, because there are other architectural optimizations that result in much bigger gains.
You missed my point.
It's fine to intentionally use the benchmarks game as a foil for what you want to say – but not when to achieve that you misrepresent the benchmarks game.
1) You nit-pick about the ruby 1.8.7 version but do nothing to show that minor version tick makes any difference.
Will 2009-06-12 patchlevel 174 be significantly faster than 2008-08-11 patchlevel 72 ?
What if the newer version is a generic OS package compiled for i486 and the older version is compiled from source for i686 ?
2) You echo "flawed" micro-benchmarks without acknowledging that every benchmarks game web page links to the "Flawed Benchmarks" page – in fact "flawed benchmarks" is the first bold link at the top of the home page.
For the past 4 years "Flawed Benchmarks" has started – "we have found that the CPU time is rarely the limiting factor; the expressibility of the language means that most programs are small and spend most of their time in I/O and native run-time code."
Don’t just take Kirk's word for it, check out The Computer Language Benchmarks Game!
("seems like the preponderance of sites implemented with Java are brutally slow" – seems like empty propaganda.)
Follow the nesting of replies. I wasn't telling you that you missed my point. I wasn't replying to you at all.
I don't care that pl174 is the most recent version available to you in Ubuntu 9.10. That's more an indictment of Ubuntu 9.10 than anything else. It just doesn't matter to me at all in the context of your benchmarks or the relevance to my points in the article. I said it in the article, and I'll say it here. It's OK. Simmer down. No attack was intended, and I apologize if did something to give you the impression that I was attacking the Shootout.
As for your other point….I'm not sure what you are shooting for, here. So you and I agree? This is a problem? I didn't provide enough links to your stuff? Or put in enough disclaimers? I used links to the shootout pages because they were convenient for illustrating my point. That's all. Again, I wasn't attacking your results, or your methodologies or the site's reason for being. I could have written the article without referring to your site at all, but it has some really interesting information in it that I thought some people might want to be pointed towards.
Regarding the Java comment, it was also in response to someone who was not you, and it's nothing more than a subjective comment based on subjective observations. It was in parenthesis for a reason.
> It just doesn't matter to me at all in the context of your benchmarks or the relevance to my points in the article.
That's right – it has no relevance to the points made in your article – apparently for some other reason you felt compelled to point out that it wasn't close to being the most recent old Ruby.
No indictment of Ubuntu 9.10 at all – it was the most recent tar ball available to them when they released. But I don't suppose you care about that either.
> Simmer down.
Maybe you have no understanding of how your words will be read, or you just don't care.
> but it has some really interesting information in it that I thought some people might want to be pointed towards
Oh! "The benchmarks there have to be taken with a couple grains of salt, anyway" was your way of suggesting that some people might find some really interesting information there!
> I could have written the article without referring to your site at all
That would have been better.
> Again, I wasn't attacking your results, or your methodologies or the site's reason for being.
Here are dictionary meanings for "take with a pinch (or grain) of salt":
- regard as exaggerated
- be incredulous about
- believe only part of
If Joe says take the results of the 100m sprint with a grain of salt he's suggesting that one of the competitors was cheating at that event.
Joe isn't making the obvious point that the fact they are good sprinters (or bad sprinters) doesn't imply they are good (or bad) at the marathon or scrabble or auto racing or …
Pardon my ignorance if I missed something in the description of the test you ran; if I understand right, all 283,659,084 requests were for the URL "/", so (making a wild guess about how Redmine works) they all just fetched the Login screen? Offhand, this seems like it wouldn't use much of the Rails stack (or more than a tiny bit of Redmine), right?
Well, with redmine, it'll show project information regardless of whether one is logged in or not, if that information is set to be public. Take a look at
http://redmine.ruby-lang.org/
I could have used any set of URLs, and they'd make nearly no difference in the results. Here's a run of a few hundred thousand requests against '/issues/show/1':
Requests per second: 2500.51 [#/sec] (mean)
That's running with a cache invalidation interval of two seconds instead of the 60 second interval I used in the examples in my article.
This is a completely meaningless benchmark. Output caching per user will always look amazing. The problem is that in the real world – you _can't_ scale an app by output caching every page. (no of users * number of distinct urls * avg page size = memory required).
So if you had 1000 different users (i.e. 1000 different cookies) how would your "scalability " look then?
How much memory do you need to cache every page of the app for every user?
True. but you can cache quite a few pages very easily. Most apps and dynamic sites show a lot of the same information over and over again, and a lot of them don't do tricky stuff like giving everyone who visits a unique cookie, even if they aren't logged in or authenticated in any way, like Redmine does.
And it takes precious little RAM to store a lot of pages. If a person had an app with 1000 unique users all using it at the same time, and each user had completely unique content (as you have to assume in a case like Redmine where the app cookies everyone, whether they are authenticated or not), you have a simple multiplication problem to figure it out. If each page is 6k of HTML, and one wants to store the top hundred per user (which is ridiculous), then 1000 * 100 * 6 = 600mb. That's a pittance if one really has that many users, and that difficult of a site to cache. Most of them aren't that painful, so they are a lot easier to deal with.
If one has an informational site that's heavy on read-only content, that's an ideal situation for some caching. For example, I have a 50 year old Mercedes truck, so I read a Mercedes web forum. Even though there is a tremendous amount of information in the forum, there are a relatively small number of threads that are being viewed at any one time. One could take my code, add a simple HTTP based mechanism to allow one's system to tell it to invalidate a cached entry (this would take very little time to do), and then set it up to cache the top hundred or so pages. It wouldn't use much RAM, and when a thread was modified, a quick call from the app would invalidate the cache. Most users hitting the forum would be seeing cached content, and never hitting the app. If someone searches for information on something esoteric like tuning Zenith 35/40 INAT carburetors, one would hit the app for that traffic, but for the majority of users the majority of the time, the very slow app would be spared from having to do anything at all.
The cases where some caching can reduce load on the application servers vastly outnumber those that are such a pain in the butt that they just can't be helped by caching, and HTML pages are generally small enough that a huge number of them can be cached in very modest amounts of RAM.
"blah blah blah so long as you only try to do things within my narrow definition of a web site that is tailored perfectly towards my claims then you will be fine blah blah blah"
Don't bother trying to inject reality into a conversation with a rails fanboy. They will redefine all the terms of the conversation in order to completely ignore every point that you make. You are absolutely right, but it won't make a difference.. they will just scream at you about how nice the emperor's new clothes are.
Well, a cook is being used in the benchmark, so I assume that instead of a login screen, redmine was showing its main panel or whatever it shows on the home page.
cookie*
Nice article. Reflects my experience in running not particularly enormous sites (but not tiny ones) — the performance issues (which are inevitable) were rarely to do with the language or its implementation; they were generally to do with our design choices. At least with Ruby you can spend more time fixing your architecture and design rather than just getting the first version ready.
http://www.playframework.org – For some reason the link in my post above is broken.
I'm finding 3k requests per second delivered from a Ruby stack hard to believe.
That would mean that each request was delivered in around a third of a millisecond… Now, just let that sink in before just nodding and smiling.
It is worthwhile to talk about caching, and the performance of various caching strategies, but if you take from this article that Ruby can deliver 3 requests per millisecond you're going to be very disappointed.
If you are confusing Ruby with Rails, you'd be right. You won't get 3 requests per millisecond from a Rails app running in a single process on an AWS instance
If you're asserting that one can't get 3 requests per millisecond from a _Ruby_ app, though, your assertion is false.
One can get significantly more than 3 requests per millisecond from a Ruby app, depending on what it's doing and the hardware. On a few-years-old Intel 5160 based machine (3.0 Ghz Xeon) Swiftiply will bench out at 21+ requests per millisecond for cached responses, which seems to be around Apache Bench's own performance ceiling on that hardware.
One can certainly write Ruby software which handles many thousands of network connections per second, and it is not really even particularly difficult. Venerable old threaded Mongrel, on my AWS instance, for a simple handler, will do more then 2 requests per millisecond. Thin will do more than 3, as will evented_mongrel, on that AWS hardware. This isn't hard stuff.
Scope the problem right, and don't do stupid things when writing the code, and your solution can be pretty damn fast.
you are finding it difficult to believe because it's a lie. 3k requests per second for static data served from memcache != 3k requests per second delivered through the rails stack.
It seems like the title of this article should be, "anything scales with a sufficient amount of caching."
I would ask another question, which is "how much time do I have to spend implementing caching logic in order to build a production Rails application that will perform well." In my experience, a lot more time than I ever spent in Java. At the moment, with Rails, the failure is that some of this isn't addressed at a layer below and outside of application design.
Yes. Part of it is that — caching is a great equalizer.
The other part is that I implemented that great equalizer _in Ruby_ and it does its job with considerable speed. It's possible to write things that run in Ruby, leverage Ruby's strengths, and still deliver respectable performance for the job at hand.
Regarding Rails, I don't want to conflate Rails the framework with Ruby the language. I've done well over a hundred production dynamic web sites and web apps since 2002, in Ruby, and not one of them was done in Rails, and most of the time I didn't worry at all about caching at any level unless the lack of caching caused the app to get bottlenecked at the database. Ruby's level of performance was well above adequate.
Now, with Rails specifically, I still think that most of the time one has to either be doing some pretty heavy things, or some pretty ill-considered things in one's code to have to worry about any caching early on in the development cycle. I've seen people, in the past, complain that their Rails app is only getting 3 requests/second performance. My first suspicions, on hearing that, is either that they are doing some terrible things in their application, that they are doing something that is incredibly CPU intensive, or that they are bottlenecked on something external to their app. Rails, while not the fastest web development framework for Ruby, just isn't THAT slow.
Also, I do know from talking with Yehuda that Rails 3 is designed to be much more caching friendly than previous generations of Rails, and that while it won't ship with ESI support built in, there should be a plugin coming along that makes ESI simple to leverage. That will be a huge boon to Rails application developers who want to leverage caching with highly dynamic or personalized content.
rails apps with any considerable usage crash and burn without caching, so plan to spend a good portion of your development cycle working on it.
Great article, but isn't using EventMachine cheating as it breaks your self-imposed rule of using a pure Ruby stack? EM relies on C code compiled as part of its gem install.
bankruptcy attorney California
Great article, but isn't using EventMachine cheating as it breaks your self-imposed rule of using a pure Ruby stack? EM relies on C code compiled as part of its gem install.
bankruptcy attorney California
It's pointless to have an opinion if you only work in one language or the other. I work in both (and have ever since both languages began) and still find ruby to be the preferred language based on all the typical reasons given. Opinions expressed without any personal perspective are meaningless.
Kirk, your aren't going to deliver the updated swiftiply source, are you?
Now if you'd said you'd tweaked rails and it was exercising the entire stack that fast, I'd be ecstatic. As it is I'm left wondering "how does your modified swiftiply know when to invalidate the cache?"