Rubinius: The Book Tour

July 2nd, 2009 at 9:07AM

This year continues to be a hot one for the Ruby programming language. The use of Ruby is growing, excitement is mounting for the release of Rails 3.0, and development of Ruby 1.9 and the alternative implementations is moving along quickly. It makes sense: bringing more value to your customers in less time with fewer resources is an obvious plus, and Ruby’s a great way to make that happen.

Rubinius, which you’ve no doubt heard lots about over the last few years, is an implementation of the Ruby language written from scratch using cutting edge technology and the best industry research. Based on the questions we’ve received over the past few months, it’s clear that a lot of folks are looking to learn more about the technologies behind the project. This is exciting because with so much written in Ruby, Rubinius positively begs Ruby developers to experiment and explore.

In this post I’ll describe each of the basic parts of Rubinius, and provide some helpful links to books that I’ve found particularly useful in understanding how Rubinius is built.

Read the rest of this entry »

6 Steps To Refactoring Rails (for Mere Mortals)

June 29th, 2009 at 9:06AM

Since December, Rails has undergone a fairly significant internal refactoring in quite a number of areas. While it was quite tricky at first, we mere mortals have started to hone a process for diving into a new area of the codebase and emerging some time later with a much improved area that does basically the same thing. Here’s the approach we’ve adopted and advocate:

First, refactoring needs to be refactoring, not revision. By that I mean that while you are in the process of invasively improving some code, it is not the appropriate time to also change the functionality of that code. If you do both at the same time, it will be difficult to track down whether a bug in the code is the result of refactoring or functionality changes.

We’ve held fast to this requirement for the Rails 3 work Carl and I have been doing, which has resulted in an extremely stable edge, despite making fairly invasive changes.

Second, any kind of significant refactoring without tests is folly. The first thing you should do is take a look at the test suite for the area in question and beef it up if necessary.

Thankfully, Rails has a fairly reasonable test suite, and the addition of Sam Ruby’s Agile Web Development on Rails test suite has provided an additional level of confidence in the changes we’re making.

Third, once you’re ready to dive in, read through the code carefully. It can be tempting to just go in and hack away at a particularly egregious part of the codebase, but you’ll frequently be changing code that exists for a reason.

Something I’ve noticed both in Rails applications and in Rails itself is that code that looks very strange at the beginning of a period of refactoring tends to exist for a reason.

Fourth, as you proceed, make very small changes, then run the full test suite after every change. Commit often. What you want to look for is cases where the boundary APIs around the code you’re writing are messy (so you have multiple ways in to a particular class or area of code where one would suffice).

One Rails example would be rendering a template in ActionView from ActionController. When I started in December, ActionController called into ActionView using a number of public and private APIs, so making any changes around those boundaries was very tricky. Some things we wanted to do, like improve the way layouts were selected, was too complex because of the number of ways templates and layouts were rendered.

The very first thing I did in the early days of the merge was work toward reducing the number of ways that ActionController told ActionView to render a template. In the end, we settled on just a single API: render_template_from_controller, which takes a Template object from the template to render, and a Template object for the layout. Once this was done, it became a lot easier to make changes on either side of the boundary, without fear that a small change in ActionView could break any number of things in ActionController.

Of course, this assumes that you understand what your boundaries are. This is something that’s learned over time, but a fundamental requirement in good refactoring is having functionality broken up into units that are easy to understand, with small surface area. This is commonly achieved using classes, which is a good starting point, but Ruby has other tricks up its sleeve as you get more advanced, like judicious use of modules (and the new Rails ActiveSupport::Concern).

Fifth, once you have reasonable boundaries, dive in and start making changes. A pretty good rule of thumb is to clean up cases where a public API has started being used for private, internal use. This might mean that changing the internals of your code breaks the public functionality (which, again, should be sacrosanct during this process). Have a zero-tolerance policy for failing tests as you make small changes, especially as you separate out public and private functionality.

One example of this in Rails was extensive usage of ActionView’s public render method by private functionality. As a result, the public render method had snippets of code inside to handle special cases (like render :file taking a Template object). The solution in this case was to extract out the private functionality, and have the public render method as well as the private internals call the new extracted methods. This ensures that internal functionality is kept internally, where it can be refactored more easily.

Sixth, don’t be afraid to git reset --hard if you find yourself sinking into quicksand, with rising confusion due to changes you made. Over the course of working with Rails, I’ve lost an hour or more at a time to changes made too rapidly and carelessly, and the only advice I can give is to give up on ratholes as early as you notice them.

So that’s it. Six easy steps to refactoring Rails.

Pair-Programming Should Be Co-Programming

June 25th, 2009 at 7:06AM

Back in 2005 a pair of Stanford students asked me if they could observe the pair-programming environment at the company I was working for. They were working on a project to challenge the notion that two people pair-programming had separate roles of “driver” and “navigator” a common notion of how pair-programming should work at the time. Back then, we had a traditional pair-programming setup: 1 desk, 1 keyboard and mouse, 1 computer and (of course) 2 people. What they observed was downright painful!

As one example, they recorded a session where someone was verbally dictating syntax and keyboard actions to their pair:

Hugh: So…
Ilya: Parenthesis. So percent, getNewArgs… [Hugh types.]
Exactly. So save off those two lines in the new method.
Hugh: Uh…
Ilya: Right…down, down, down, there we go.
Hugh: So we…
Ilya: So, percent getNewArgs equals percent args [Hugh
types this line to terminal.] Uh, I think that’s it.
Hugh: This?
Ilya: Yeah, that’s all we want to do. Get rid of the blank line
and close the new.

From a pair programming session revealing the perils where one person “drives” while the other “navigates.” Excerpt from “The Social Dynamics of Pair Programming

This was clearly the wrong way to go about pair-programming. “Driver” and “navigator” was turning out to be closer to “driver” and “back-seat driver” and like all experiences of back-seat driving, it could be frustrating for the driver, and generally unproductive. What we’ve found at Engine Yard is that it’s far better to optimize the pair-programming environment not for a “driver” and “navigator,” but for co-programming.

Jon Crosby and Ezra Zygmuntowicz pair-programming at Engine Yard

Jon Crosby and Ezra Zygmuntowicz pair-programming

A good co-programming environment should reduce the friction for any task, and has three rules:

  1. Create a shared environment, where the pair can fully immerse itself in the problem at hand.
  2. Make it easy for a member of the pair to ‘fork’ off and not interrupt flow.
  3. Remove any obstacles that get in the way of completing each other’s syntaxes sentences.

The Engine Yard Pair-Programming Setup

Two keyboard and mouse sets: This alone dramatically improves a pairing environment. Often we see one member of the pair ‘hovering’ over the keyboard — a non-verbal cue indicating that they want to take over. It’s amazing how effective code can be in expressing an idea over a verbal description or notepad sketches. No more oral syntax descriptions!

Dedicated pair-workstations: Identical workstations with identical configurations, including editor. We use iMacs with nice big screens. Similar environments make it easy for pair switch-up. Everyone is familiar with the environment on the pair-stations, so there is no re-learning a new environment depending on who you happen to be pairing with.

3-Computer Setup: Each pair brings their laptop to dock alongside the pairing station. This enables any pair to perform research, and kick-off long running processes, without losing context on the dedicated workstation. While it takes more discipline to stay on task, we think it’s worth the flexibility.

I don’t claim that this is the perfect environment for all situations; but it’s something that works well for us.

Introduction to BDD with Cucumber

June 23rd, 2009 at 7:06AM

Cucumber is a framework for writing and executing high level descriptions of your software’s functionality. Call these tests, examples, specifications, whatever… it doesn’t matter too much. What I’m talking about has traditionally been called functional, integration, and/or system tests. In XP terms this includes tests called Story Tests, Customer Tests, and/or Acceptance Tests.

One of Cucumber’s most compelling features is that it provides the ability to write these descriptions using plain text in your native language. Cucumber’s language, Gherkin, is usable in a growing variety of human languages, including LOLZ. The advantage of this is that these feature descriptions can be written and/or understood by non-technical people involved in the project.

One important thing to keep in mind is that Cucumber is NOT a replacement for RSpec, test/unit, etc. It is not a low level testing/specification framework.

Cucumber plays a central role in a development approach called Behaviour Driven Development (BDD).

A Bit About BDD

Dan North describes BDD as “writing software that matters” [in The RSpec Book] and outlines 3 principles:

  1. Enough is enough: do as much planning, analysis, and design as you need, but no more.
  2. Deliver stakeholder value: everything you do should deliver value or increase your ability to do so.
  3. It’s a behavior: everyone involved should have the same way of talking about the system and what it does.

BDD in its grandest sense is about communication and viewing your software as a system with behaviour. BDD tools such as RSpec and Cucumber strive to enable you to describe the behavior of your software in a very understandable way: understandable to everyone involved. Read the rest of this entry »

A Quick Primer on Sharding for Ruby on Rails

June 18th, 2009 at 7:06AM

Sharding is usually the final strategy to reach for when scaling a Ruby on Rails app: caching, offloading, and data segmentation are usually the first strategies to implement when scaling your application (they’re usually easier).

It probably sounds obvious, but it’s always important to find out what part of your application needs help before you start re-architecting. If you’re having issues with your database, and you build a spiffy disk sharding scheme, you’ve just fixed a problem that doesn’t exist. So, doing the proper discovery will allow you to allocate your efforts for best effect.

Finding your performance hotspots is very important in this process. A hotspot is a point in the architecture where you’re running at high percentages of capacity, or where your application is spending a lot of time. Hotspots are where the flames start. Knowing your points of pain allow you to triage correctly, and to know how to best spend your developers’ time. Using a combination of resource monitoring (like nagios) and performance introspection (like New Relic) is essential to identifying your Ruby on Rails hotspots.

One of the things to keep in mind is that this process is ongoing. When you clear out one hotspot generally another one will pop up to take its place as you grow. You might be optimizing disk reads and writes one week, and be neck deep in a SQL re-write the next.

If you have a proper staging setup, you can build estimates against generated traffic. This can give you a (blurry) view into what the next hotspot might be. A good process is to capture an hour or so worth of traffic on the live site, and replay it two, three, or more times faster against the staging environment. You want the traffic to be as real as possible. You can even go further and do a formal load test using a tool like browsermob.

When to go deep, and when to go long

After you have killed all of the hotspots you can, and added all the resources you can afford, it’s time to look at the next level. Usually this is when you start to see people thinking about sharding of some manner. There are three major types of sharding at the moment - File System, Database and Application. I’ll touch on each of these topics, starting with the highest level, and hardest.

Ruby on Rails Application Sharding?

Application sharding is the most extreme, provides the most benefit and is the hardest to accomplish. There are several ways to accomplish application sharding.

If you can split your users amongst several vertical groups, you can basically install copies of the application for each segment. This method assumes that users in each group will not need to interact.

For example, if you can segment your user base into three groups who do not really interact, you can simply provision 3 environments and install 3 separate copies of the application. An example of this might be a site hosting application. Each site hosted will not need much (if any) interaction with the other sites hosted. This is by far the easiest method of sharding your application.

You can also look at abstracting any shared logic into a back end service accessible via API. The rule of thumb there is to have each back end application do one thing, and do one thing very quickly. Service oriented architectures (SOA) get this by design.

Alternatively, you can also look at this from a business logic viewpoint. If you can cut your application into portions (say, photos, chat and games for a social site) you can create smaller applications to handle photos, chat and games as well as the shared authentication and user information storage parts. Have the photos, chat and games applications leverage the back end authentication and user information applications to read and write shared information.

This gives us several advantages. For the back end application you can remove all unneeded code (i.e., if you are not going to need provide views, then remove ActionView), plugins and gems. Keep the app as light as possible, and give each of the application shards on dedicated resources (i.e., their own databases).

Another advantage of this approach is that you can start to optimize your hardware spend. If your chat application is 1/2 as intensive as your photo and games applications, it’s far easier to assign resources in a targeted fashion and maximize returns. In a monolithic application, if the photo application breaks, or needs more resources the entire stack is affected. With sharding, you get some buffering from some site wide issues, and the ability to assign resources exactly where they’re needed. The big drawback is that it’s not easy.

Database Sharding

This is another step that can be looked at in certain circumstances. If the amount of data you need to process is so large, or the number of transactions is sinking your Database, you can look into database sharding. Basically, you take your database and break the schema up among several Database servers. There are tools in most major RDBMS’s which will allow you to take care of this. Informing the application where the data is might be complex depending on which RDBMS you use.

Filesystem Sharding

If your application is file system IOPS heavy, file system sharding might be the route that you want to look at. Basically you add more hardware disk arrays, and split the reads and writes between them. You need to inject some logic into the save and open functions in your application so that it knows which file system each file is to be saved to and opened from. Usually you can create a hash of the file name, and key off the first couple of characters in the hash. If you’re interested, you can read our more detailed dive into file system sharding.

That’s No Moon!

Scaling can be a daunting task if you put it off too long. It can mean the difference between a successful business and one that dies. Don’t let that scare you however. Taken in small, bite sized chunks it’s certainly an achievable goal. Make sure that you are working on the right problems, and make sure that you are doing a little throughout the lifespan of your application.

And keep in mind, Scaling is a Discipline, not a Goal. What works great for 20 users:

users = User.find(:all)
for user in users
if user.name = “fred”
user.make_happy
end
end

may not work as well with 2000, or 20,000. So do the work it takes to make your application work today, and keep in mind the changes you’ll have to make, and the challenges you’ll face tomorrow.

Getting Started With JRuby

June 15th, 2009 at 9:06AM

In the wake of our recent announcement of JRuby support, we have a guest post from Charlie Nutter of the JRuby team on getting starting with JRuby:

“Last week, Engine Yard announced they would soon support running JRuby in their cloud environment. I think I speak for the whole JRuby community when I say how excited we are about this new possibility. JRuby has proven itself a top-notch, production-quality Ruby implementation, and the Engine Yard announcement really made us feel proud of what we’ve accomplished. It also got us thinking about what JRuby really means for Engine Yard customers.

JRuby is, simply put, Ruby on top of the Java virtual machine. While this means you get the benefits of the JVM’s world-class garbage collectors, libraries, and optimizations, it does not mean you have to know Java to use JRuby. We’ve worked very hard to make JRuby look and feel “just like Ruby.” So much so, that these days basically all pure-Ruby libraries should “just work” out of the box. Rails runs great, and there’s dozens of production users out there reaping the benefits of JRuby’s outstanding memory management, native threads (actually running in parallel!), and excellent performance…all of which we continue to improve with every release. JRuby at Engine Yard means you’ll also be able to take advantage of Engine Yard Ruby and Rails expertise, along with the assurances that your application will “just work” in their cloud.

So how do you get started with JRuby? Easy!

  • Download JRuby from http://www.jruby.org. JRuby 1.3.0 is the current release, but you can feel comfortable testing out either 1.3.0 or 1.2.0: the previous release several folks already have in production.
  • Unpack it somewhere convenient. You don’t have to install it as root, but you can if you like. And you can have as many separate JRuby installs as you want, alongside any standard Ruby installs already on your system.
  • Put JRuby’s “bin” directory somewhere in your PATH, so you can run the “jruby” command easily.

That’s it! You’re ready to try it out!

Read the rest of this entry »

What is RubySpec?

June 11th, 2009 at 8:06AM

You might think that What is the meaning of life? is a tough question. But here’s one that will give it a run for its money: What is Ruby?

Ok sure, that comparison is hyperbole, but bear with me. Try this out in your irb session:

>> Float("0.5") == "0.5".to_f
=> true

That’s reasonable enough. Imagine if it weren’t true! But did you know that the Float() method converted the "0.5" text string to a Float object without calling the string’s to_f() method?

What is the definition of Ruby in this situation? Is it that Float() returns a Float object for a validly formatted text string or that Float() does so without calling the string’s to_f() method? Let’s investigate the situation a bit more.

In Ruby, if you define an arbitrary object that you want to behave like a Float object, you define a to_f() method for your object. Then Float() will call that method on your object:

>> s = "0.5"
=> "0.5"
>> def s.to_f() 42 end
=> nil
>> floaty = Object.new
=> #<Object:0x5eb190>
>> def floaty.to_f() 0.5 end
=> nil
>> Float(floaty) == Float(s)
=> true

Now that is surprising. A lot of the elegance of Ruby comes from generally everything being an object. In some sense, floaty and "0.5" are just objects, so why does the Float() method treat them differently?

More importantly, should you rely on Float() not calling your string’s to_f() method, or is that merely an implementation detail of MRI (Matz’s Ruby Implementation)? This is the dilemma faced repeatedly by every alternative Ruby implementation.

Fortunately, we have a powerful tool to assist us.

The RubySpec project is writing an executable definition of the Ruby programming language using RSpec-style specs. The tremendous utility of the specs is that alternate Ruby implementations can run them to determine if they are building a compatible Ruby engine.

Presently, the specs contain over 33,000 precisely defined facets of Ruby behavior. The specs cover Ruby behavior across different platforms, operating systems, and versions of the Ruby language. The goal is to ensure that Ruby applications written to use the core Ruby features covered by the specs will run the same on any Ruby implementation.

RubySpec has been well-known in the community of Ruby implementers for almost two years. Every major Ruby implementation is using it. However, many Ruby programmers are just learning about it. RubySpec has a lot to contribute to the larger Ruby community. Recently, I explain some ideas about this in an interview with Gregory Brown. Greg is starting a project, called Unity, to make the information contained in the RubySpecs more accessible to everyone.

Contributing to RubySpec is a great way to learn more about the Ruby programming language. At the same time, your contribution helps the alternate Ruby implementations and the Ruby ecosystem. In the past couple of weeks, contributors have added tons of fixes to the specs for Ruby 1.8.7 and 1.9. Check out their excellent work at the RubySpec Github repository.

I’ll be speaking about RubySpec at the upcoming Open Source Bridge conference June 17-19. If you have questions about RubySpec that you’d like me to address, please leave a comment.

BigDecimal Vulnerability in Ruby 1.8.6 and 1.8.7

June 10th, 2009 at 9:06AM

Yesterday, the first security vulnerability since Engine Yard took over maintenance of Ruby 1.8.6 was reported. It is a Denial of Service vulnerability in BigDecimal, by which an attacker can cause a segmentation fault by providing a very large number as input. ActiveRecord relies on BigDecimal, but this is not Rails specific.

Today, as part of our maintainer role for 1.8.6, we published a fix as part of Ruby 1.8.6 patch-level 369 and as a part of Ruby 1.8.7 patch-level 173.

The issue was initially discovered and fixed in the Ruby 1.9.1 trunk. We backported the fix to 1.8.6 by writing a test, watching it fail, then making it pass (the same way we always do). As part of our test-driven approach, Kirk Haines then added a test in RubySpec to test for the condition. We ran the test suite on OSX, RedHat Enterprise 3, CentOS 4, 32 and 64 bit Engine Yard Solo instances, and an Engine Yard Slice to verify the fix.

Engine Yard customers have been notified about the vulnerability via email with instructions on how to upgrade. Engine Yard Solo customers can get the new, patched version of Ruby 1.8.6 simply by redeploying their environments. In the future, new Engine Yard deployments will automatically get the new version.

New Solo Release—Server Monitoring, Self-Managed vhost Templates

June 8th, 2009 at 2:06PM

Today we’ve shipped a new release of Engine Yard Solo. This release includes a ton of bug fixes, but also contains a killer new feature.

We’ve enabled server monitoring for all customers!

Once you enable this new feature by hitting ‘Deploy’ in your dashboard to sync our recipes, you will have full server monitoring enabled for a number of important stats:

  • Load Average
  • Free Memory
  • IO-Wait
  • Swap Used
  • Free Disk Space for EBS volumes

Engine Yard Solo server alerts report

We will start collecting server alerts for you and displaying them in your dashboard. This is a great feature to give you much more insight into what your slices are doing and how healthy they are. The alerts screen gives you an at-a-glance place to see the health of your deployments — if the top of each section is green you know you are in good shape. If the top of each alert section is yellow or red — you know there may be some issues to look into and what the issues are!

We’ve also released a way for you to tell our automation system that you have local edits to certain config files that you do not want us to stomp on or touch. This is especially important for some apache or nginx vhost files. Before this change there was no way for your local edits to survive a ‘Deploy’ of our recipes. Now if you want to keep a config file with your own edits you just have to rename it with ‘keep.’ prepended to the name.

For example, say you want to edit the nginx vhost for your application that lives at /data/nginx/servers/myapp.conf. If you make changes to this file, all you need to do is rename it to /data/nginx/servers/keep.myapp.conf, and we will never touch that file again until you rename it back or remove it. This gives you the power to make customizations to your vhost configs without us stepping on them later. And since your vhosts are stored on your /data partition, which is an EBS device, your changes will even persist across terminate/create events as long as you mount the same volumes when you boot your environment back up.

I think these features make the platform much more useful and I hope you enjoy using them.

Http Digest Auth: Vulnerability in Rails 2.3.1/2

June 4th, 2009 at 5:06PM

If you are using Ruby on Rails 2.3.1 or 2.3.2,  using http *digest* authentication and setting the username / password via hash, then you will be affected by this vulnerability. This vulnerability allows users to bypass http authentication without a valid password.

Please read the full posting on the Rails Security Group for more details and the appropriate workaround to implement in your code, until the official fix is available in the 2.3.3 release.

(Engine Yard customers have already been contacted via email about this vulnerability).

-->