Blog

My Summer of (Open) Source

By | January 5th, 2012 at 12:01PM

The last few months have been an great experience for me. I’m a graduate student from Potsdam, Germany. However, as some of you might already know, I’m also rather active in the Ruby community. This past year, I had an amazing opportunity.

Engine Yard sponsors a couple of Open Source developers to work full time on their projects. When I asked Dr. Nic Williams whether they would sponsor me spending three months in Portland, working together with Brian Ford on Rubinius, I expected nothing but a no. Turns out, Engine Yard was at least as thrilled about this idea as I was. A few days ago, I finally got back to Germany, and I wanted to give you a quick overview of what I’ve been working on during my time overseas.Like many others, I started contributing to Rubinius a while ago. However, I never really dared to play with the internals. So, my first stop was the Rubinius compiler. To make sure I really understood it and that it’s as flexible as it claims to be, I wrote a Smalltalk implementation using the Rubinius compiler infrastructure and looked into improving its API.

It’s a fun thing to do, as the Rubinius compiler is written entirely in Ruby. And, since Rubinius is bootstrapped, it also runs on other Ruby implementations. That is how you usually install Rubinius: You load the compiler from CRuby, it then compiles the compiler to Rubinius bytecode. If you want to look into this, there is some excellent documentation available on the Rubinius website.

This bytecode can then be executed by the Virtual Machine, which was my next stop. It took me a while to fully understand how things work within the VM. It is actually the only major part of Rubinius not written in Ruby, and the main reason for it’s blazing performance and excellent memory footprint. I am planning ton writing another blog post, or possibly even a series of blog posts about these internals.

Apart from bug fixes and API improvements, I used the gained knowledge to fix, for instance, one of Ruby’s least known and most confusing feature: the implemented flip-flops.

The last thing I worked on was Puma, a new web server for Rails/Rack/Sinatra applications. Rubinius 2.0 is about to be released, fully able to make the best use of all your CPUs. However, most web servers used for deploying Ruby applications are actually single-threaded. Since there is no real threaded option that is still maintained and not JRuby specific, Evan Phoenix and I started working on a new server.

Like many other servers, it uses the rapid HTTP parser that comes with Mongrel. It also uses a dynamically sized thread-pool for processing requests in parallel. With Puma, you now have a go to choice when it comes to deploying web applications on Rubinius. And since it does not contain any Rubinius specific code, it also works quite well on JRuby or CRuby.

To make sure we are heading in the right direction, I started working on a tool for benchmarking web applications under realistic load. The main issue with just using ab, the standard solution for measuring HTTP performance, is that it results in unrealistic numbers both on JRuby and Rubinius. When using ab, you just send the same request over and over again, causing the JIT and code inliner to highly optimize for exactly that request. This usually doesn’t reflect the actual production behavior, though. I therefore wrote code simulating a real browser session and, of course, running multiple of these sessions in parallel.

You think that’s all? Far from it! The Engine Yard OSS Community Grant Program enabled me to speak at six different conferences all over America. At Rocky Mountain Ruby, RubyConf Brazil and RubyConf Uruguay, I gave a talk on “Real Time Rack”. In San Francisco, at GoGaRuCo, I gave a presentation about “Smalltalk On Rubinius – or How To Implement Your Own Programming Language”. At this past year’s RubyConf in New Orleans, I spoke about “Message in a Bottle” and last but not least I gave a presentation titled “Beyond Ruby” at RubyConf Argentina in Buenos Aires.

Tao of Documentation

By | January 3rd, 2012 at 4:01PM

100 Aker Wood

ritcheyer

In November I was given the honor of speaking at RubyConf Uruguay 2011. If you have a chance in 2012 to go to a conference I would highly recommend heading to South America. All the countries work together to setup a conference tour so you can start in Chile or Colombia and work your way down to Argentina and Uruguay. The Uruguayan conference organizers are amazing. Big props to EvanNicolás, Pablo and the rest of the crew.

Pain

I gave a talk on how I believe it is extremely important for companies to have very detailed documentation about how to use their product and how they can make it easier for their developers to help with that process without making them feel like they are wasting their time. It frustrates me a lot when I use a product that I really like, but I cannot for the life of me figure out how to use it because documentation is non-existent. The worst is when I find out about a feature from reading another users blog post about how he/she was digging around and found this awesome hidden feature. Seriously!! I should not have to dig around to find out how to use your product.

Open source projects sadly are not excluded from this offense. I am not referring to documenting the codebase by using YARD, RDoc or TomDoc, but rather having good examples, HowTo and FAQ sections. Look at projects such as fogRiak and Renee. Early Ruby on Rails users will remember how difficult it was to figure out all the aspects of Rails, however new users greatly benefit from the amazing Ruby on Rails guides that are constantly updated.

Journey

“Do the difficult things while they are easy and do the great things while they are small.” This quote is from my favorite philosopher Lao Tzu. Taoism is awesome and if you do not know anything about it I recommend reading The Tao of Pooh, but I digress. Engine Yard definitely did not follow Lao Tzu in this regard and we felt the pain when we decided to fix the situation. Please DO NOT start this process late. I know it seems painful, but just like with TDD writing good Documentation for your users will keep you sane and happy in the long run. Our documentation in the early days were not updated frequently at all and it was frustrating. We had setup DokuWiki and found out later on that it was not the most intuitive wiki to use, but it was better than having nothing at all. “A journey of a thousand miles must begin with a single step.” Well it took us a while, but we finally took that step and found the right tools and workflow that completely overhauled our documentation and I can proudly say that it kicks ass now.

Solution

There are a lot of tools out there that will allow you to write great documentation easily and quickly so honestly there is no excuse to have poor or zero documentation. Gollum, stasis, and nanoc are some of the many tools that you can look into using. Looking over all the tools we decided on using gollum with gollum-site that would convert the Markdown formated gollum pages into static HTML pages. We created a public ey-docs repo on GitHub so that it is easy for anyone to help contribute to our documentation. Using Markdown means that everyone who is comfortable with the syntax can clone the repo and start making changes in their respective text editor and push those changes up. Using the tools we picked allows us to easily change the style of the documentation site as well as structure the categories and pages appropriately. We even have a nice release notes RSS feed section so that our customers can be up to date on all the new updates we release to our technology stack.

Results

21 people have now contributed and we have had a 140% increase in the number of vistors to our documentation. Our guides on how to get setup to start developing with Ruby on Rails was so good the organizer of Tijuana.rb converted it to Spanish.

All Winnie-the-Pooh wants is to live a peaceful life full of joy. Shouldn’t you want the same for yourself and your users? Go create some amazing documentation.

For those of you that have your own way of writing documentation let me know about the tools and workflow you use in the comments below.

If you care to listen to the talk and see the slides here is the link: Tao of Documentation. The link to the audio is in the description section.

It’s All About 2012

By | December 29th, 2011 at 11:12AM

Happy (almost) New Year! 2011 was certainly a year for the books in terms of Ruby, node.js, PHP, Rubinius and JRuby, but we already know that 2012 will be even greater.

Codemash

January 11-13 | Sandusky, OH

We’re kicking off the year by hosting Codemash’s after-party in Sandusky, Ohio. We’re very excited to be heading out of San Francisco and into the frosty Mid-West winter, especially to hear Evan Machnic speak at CodeMash about Rails development on Windows. Crazy right?! You’ll have to be there to believe it. We’ll also have an awesome booth at CodeMash with brand new Engine Yard swag, including t-shirts. Come say hello and check them out!

SF JRuby Meetup

January 19 | San Francisco, CA

Then it’s back to San Francisco for the SF JRuby Meetup to hear Xavier Shay and Steve Connover talk about Square’s use of JRuby. Xavier and Steve will present the history of JRuby at Square, an epic quest that saw our protagonists face the demons of Kirk, Mizuno, Jetty, Neo4j, threads, startup times, and cross-Ruby compatibility, before emerging victorious with a setup fit for the gods. This event will take place at Engine Yard’s headquarters on January 19th at 6:30 pm.

Neo for Ruby-Jay Meetup

January 12 | San Francisco, CA

January’s Neo for Ruby-Jay Meetup will also take place at our headquarters on January 12. Andreas Kolleger will dive into the current options of building and deploying a Ruby app backed by Neo4j. We know that that ’4j’ might look a bit suspicious, but we promise that it’ll be awesome.

LessConf

February 23-24 | Atlanta, GA

We’ll also be hosting LessConf’s after-party in Atlanta, Georgia. Come check out summer camp for startups in the middle of winter! If you are forward thinking about Ruby on Rails, inspired and like puppies, you’re not going to want to miss out on this cool conference.

JRuby Conf

May 21-23 | Minneapolis, MN

Finally, we’re already starting to gear up for JRubyConf 2012. Early Bird tickets are on sale until January 25th, so get yours now! And if you’d like to sponsor or speak, please email events@engineyard.com. This will be the fourth year of the conference, and is going to be biggest and best yet.

With all the great events we’ve got planned in 2012, we can’t imagine a better way to ring in the new year.

Special JRuby Release: 1.6.5.1

By | December 28th, 2011 at 11:12AM

For the Impatient

  1. JRuby 1.6.5.1 is a single patch release of JRuby 1.6.5 to fix CERT advisory: CERT-2011-003.  ALL USERS: PLEASE UPGRADE
  2. We talk about plans for the upcoming 1.6.6 release

CERT Details

Hashing 101

(For proper CSci vocabulary and a lot of fun details about hashing also read this wikipedia article)

Hash tables apply a math function (hashing function) to the key of a key-value pair. The result of the hashing function is a location to a hash bucket which stores the key/value pair internally:

a[:heh] = 1
hashing_function(:heh) -> store :heh/1 in hash bucket #3
a[:foo] = 2
hashing_function(:foo) -> store :foo/2 in hash bucket #13
a[:bar] = 3
hashing_function(:bar) -> store :bar/3 in hash bucket #1

Hashes have many buckets and in theory all key/value pairs added to a hash will get spread out evenly across the hashes buckets.  In practice, some number of keys will end up hashing into the same hash bucket (known as a hashing collision).  As you get more key/value pairs stored to the same hash bucket the time to access those particular key/value pairs will slow down.  This is because you need to walk some portion of the entries in the bucket to find the specific one you are looking for (hash structures will often make entries in an individual bucket a simple list structure).

a[:gar] = 4
hashing_function(:gar) -> store gar/4 in hash bucket #3 (same bucket as :heh)

In this example, accessing a[:gar] and a[:heh] may take longer than the other keys because they are sharing a hash bucket.

The Attack

The general application of the attack is for “the bad guys” to figure out a large set of values which will hash to the same hash bucket.  Once they create this list they will send all those values to a server.  The server will store them in a hash (think parameter list in Rack, for example).  The act of storing or accessing any of those values takes longer and longer as the number of entries in a single hash bucket grows.  The result will be a Denial Of Service (DOS) attack if enough values get stored.

hashing_function(:hostname) -> hash bucket #3
hashing_function(:aZ1) -> hash bucket #3
hashing_function(:cvg) -> hash bucket #3
hashing_function(:azr) -> hash bucket #3
... # many elided
hashing_function(:1fr) -> hash bucket #3
hashing_function(:yu3) -> hash bucket #3
hashing_function(:hyX) -> hash bucket #3
host = params[:hostname] # Uh oh! need to find this amongst many bucket buddies

The Fix

Adding a little bit of randomization to the hashing algorithm ends up making it much, much more difficult to figure out how to generate this type of attack.  JRuby 1.6.5.1 (and all later JRuby releases) all have this additional randomization built into the hashing algorithm.  The result should be decent hash bucket distribution that is difficult for attackers to predict.

More information

This vulnerability is not exclusively an issue of JRuby.  Other Ruby implementations also have a similar issue (also patched today).  In fact, Java and PHP also appear to be susceptible to this style of attack.  For more information, please see the CERT announcement.

Also, consider that language implementations are really only susceptible to this attack via frameworks which allow an external hacker to store arbitrary and/or unbounded key/values into a hash.  Ruby Rack had this vulnerability, but they have fixed things so that the amount of parameters stored is bounded by a size to remove the possibility of a DOS attack.  Rack users should upgrade to the latest version.

JRuby’s First Security Fix-Only Release

We debated rolling what we have in our 1.6 branch along with the hashing vulnerability fix (mentioned above) and pushing out 1.6.6.  This was unappealing for a couple of reasons:

  1. For stable environments deployed using 1.6.5 we would be asking them to evaluate this security fix and any other fix we placed on JRuby 1.6 branch in the last two months.  This seems like it would force more conservative users to perform their own build to manually patch just the security fix.
  2. Of bugs we have fixed so far we felt we were about 10 short of what we wanted to have in JRuby 1.6.6

After consideration, we felt it best to give a security fix release now (A single security patch release JRuby 1.6.5.1 <— update to this now please) to satisfy the cautious and to wait until we felt good about the quality of 1.6.6.  As they say, Open Source projects are ready when they are ready…

Hey! When will you be ready? What is missing?

It has been about two months since our last release and we suspect we can wrap things up in the next couple of weeks.  We plan on releasing JRuby 1.6.6 in mid-January.

As we have been saying all through the 1.6 series, we are primarily fixing 1.9 compatibility bugs.  Generally speaking, our 1.9 issue fixing has been dominated by encoding errors in Regexps, IO, and String.  Here is a list of what we have done so far.  It is also worth mentioned we fixed the regression which regressed Fiber (JRUBY-6170) in JRuby 1.6.5.  Also the dreaded missing ‘read_nonblock’ has been fixed (JRUBY-5529).

Here is the list of issues we are plan on settling for 1.6.6.  A few noteworthy mentions in this list is JRUBY-5657 (new 1.9 splat behavior), JRUBY- (new 1.9 to_ary behavior), and JRUBY-6067 (Windows YAML issue).

If there is some issue we don’t have targetted but you think is drop-dead important then please let us know…We are willing to expedite other issues if presented with a reasonable case for why it should be fixed.  Please join the discussion.

Cloud Out Loud Interview with the Authors of PHP Master: Write Cutting Edge Code

By | December 27th, 2011 at 4:12PM

We’re glad to release this podcast we did with Davey ShafikLorna Mitchell, and Matthew Turland, the authors of a new book by Sitepoint called PHP Master: Write Cutting-Edge Code.

You can read more about the podcast, check out related links, and have a listen at the Engine Yard Developer Center.

Thanks much to the authors for participating. You guys rock!