• Sales: (866) 518-YARD

Archive for June, 2009

6 Steps To Refactoring Rails (for Mere Mortals)

By Yehuda Katz | June 29th, 2009 at 9:06AM

Since December, Rails has undergone a fairly significant internal refactoring in quite a number of areas. While it was quite tricky at first, we mere mortals have started to hone a process for diving into a new area of the codebase and emerging some time later with a much improved area that does basically the same thing. Here’s the approach we’ve adopted and advocate:

First, refactoring needs to be refactoring, not revision. By that I mean that while you are in the process of invasively improving some code, it is not the appropriate time to also change the functionality of that code. If you do both at the same time, it will be difficult to track down whether a bug in the code is the result of refactoring or functionality changes.

We’ve held fast to this requirement for the Rails 3 work Carl and I have been doing, which has resulted in an extremely stable edge, despite making fairly invasive changes.

Second, any kind of significant refactoring without tests is folly. The first thing you should do is take a look at the test suite for the area in question and beef it up if necessary.

Thankfully, Rails has a fairly reasonable test suite, and the addition of Sam Ruby’s Agile Web Development on Rails test suite has provided an additional level of confidence in the changes we’re making.

Third, once you’re ready to dive in, read through the code carefully. It can be tempting to just go in and hack away at a particularly egregious part of the codebase, but you’ll frequently be changing code that exists for a reason.

Something I’ve noticed both in Rails applications and in Rails itself is that code that looks very strange at the beginning of a period of refactoring tends to exist for a reason.

Fourth, as you proceed, make very small changes, then run the full test suite after every change. Commit often. What you want to look for is cases where the boundary APIs around the code you’re writing are messy (so you have multiple ways in to a particular class or area of code where one would suffice).

One Rails example would be rendering a template in ActionView from ActionController. When I started in December, ActionController called into ActionView using a number of public and private APIs, so making any changes around those boundaries was very tricky. Some things we wanted to do, like improve the way layouts were selected, was too complex because of the number of ways templates and layouts were rendered.

The very first thing I did in the early days of the merge was work toward reducing the number of ways that ActionController told ActionView to render a template. In the end, we settled on just a single API: render_template_from_controller, which takes a Template object from the template to render, and a Template object for the layout. Once this was done, it became a lot easier to make changes on either side of the boundary, without fear that a small change in ActionView could break any number of things in ActionController.

Of course, this assumes that you understand what your boundaries are. This is something that’s learned over time, but a fundamental requirement in good refactoring is having functionality broken up into units that are easy to understand, with small surface area. This is commonly achieved using classes, which is a good starting point, but Ruby has other tricks up its sleeve as you get more advanced, like judicious use of modules (and the new Rails ActiveSupport::Concern).

Fifth, once you have reasonable boundaries, dive in and start making changes. A pretty good rule of thumb is to clean up cases where a public API has started being used for private, internal use. This might mean that changing the internals of your code breaks the public functionality (which, again, should be sacrosanct during this process). Have a zero-tolerance policy for failing tests as you make small changes, especially as you separate out public and private functionality.

One example of this in Rails was extensive usage of ActionView’s public render method by private functionality. As a result, the public render method had snippets of code inside to handle special cases (like render :file taking a Template object). The solution in this case was to extract out the private functionality, and have the public render method as well as the private internals call the new extracted methods. This ensures that internal functionality is kept internally, where it can be refactored more easily.

Sixth, don’t be afraid to git reset --hard if you find yourself sinking into quicksand, with rising confusion due to changes you made. Over the course of working with Rails, I’ve lost an hour or more at a time to changes made too rapidly and carelessly, and the only advice I can give is to give up on ratholes as early as you notice them.

So that’s it. Six easy steps to refactoring Rails.

Share this post:
  • email
  • Digg
  • del.icio.us
  • Reddit
  • Slashdot
  • StumbleUpon
  • Technorati
  • Twitter
  • Google Bookmarks
  • Facebook
  • LinkedIn
Popularity: 5% |
Rate this post: 1 Star2 Stars3 Stars4 Stars5 Stars
Loading ... Loading ...

Pair-Programming Should Be Co-Programming

By Joe Arnold | June 25th, 2009 at 7:06AM

Back in 2005 a pair of Stanford students asked me if they could observe the pair-programming environment at the company I was working for. They were working on a project to challenge the notion that two people pair-programming had separate roles of “driver” and “navigator” a common notion of how pair-programming should work at the time. Back then, we had a traditional pair-programming setup: 1 desk, 1 keyboard and mouse, 1 computer and (of course) 2 people. What they observed was downright painful!

As one example, they recorded a session where someone was verbally dictating syntax and keyboard actions to their pair:

Hugh: So…
Ilya: Parenthesis. So percent, getNewArgs… [Hugh types.]
Exactly. So save off those two lines in the new method.
Hugh: Uh…
Ilya: Right…down, down, down, there we go.
Hugh: So we…
Ilya: So, percent getNewArgs equals percent args [Hugh
types this line to terminal.] Uh, I think that’s it.
Hugh: This?
Ilya: Yeah, that’s all we want to do. Get rid of the blank line
and close the new.

From a pair programming session revealing the perils where one person “drives” while the other “navigates.” Excerpt from “The Social Dynamics of Pair Programming

This was clearly the wrong way to go about pair-programming. “Driver” and “navigator” was turning out to be closer to “driver” and “back-seat driver” and like all experiences of back-seat driving, it could be frustrating for the driver, and generally unproductive. What we’ve found at Engine Yard is that it’s far better to optimize the pair-programming environment not for a “driver” and “navigator,” but for co-programming.

Jon Crosby and Ezra Zygmuntowicz pair-programming at Engine Yard

Jon Crosby and Ezra Zygmuntowicz pair-programming

A good co-programming environment should reduce the friction for any task, and has three rules:

  1. Create a shared environment, where the pair can fully immerse itself in the problem at hand.
  2. Make it easy for a member of the pair to ‘fork’ off and not interrupt flow.
  3. Remove any obstacles that get in the way of completing each other’s syntaxes sentences.

The Engine Yard Pair-Programming Setup

Two keyboard and mouse sets: This alone dramatically improves a pairing environment. Often we see one member of the pair ‘hovering’ over the keyboard — a non-verbal cue indicating that they want to take over. It’s amazing how effective code can be in expressing an idea over a verbal description or notepad sketches. No more oral syntax descriptions!

Dedicated pair-workstations: Identical workstations with identical configurations, including editor. We use iMacs with nice big screens. Similar environments make it easy for pair switch-up. Everyone is familiar with the environment on the pair-stations, so there is no re-learning a new environment depending on who you happen to be pairing with.

3-Computer Setup: Each pair brings their laptop to dock alongside the pairing station. This enables any pair to perform research, and kick-off long running processes, without losing context on the dedicated workstation. While it takes more discipline to stay on task, we think it’s worth the flexibility.

I don’t claim that this is the perfect environment for all situations; but it’s something that works well for us.

Share this post:
  • email
  • Digg
  • del.icio.us
  • Reddit
  • Slashdot
  • StumbleUpon
  • Technorati
  • Twitter
  • Google Bookmarks
  • Facebook
  • LinkedIn
Popularity: 1% |
Rate this post: 1 Star2 Stars3 Stars4 Stars5 Stars
Loading ... Loading ...

Introduction to BDD with Cucumber

By Dave Astels | June 23rd, 2009 at 7:06AM

Cucumber is a framework for writing and executing high level descriptions of your software’s functionality. Call these tests, examples, specifications, whatever… it doesn’t matter too much. What I’m talking about has traditionally been called functional, integration, and/or system tests. In XP terms this includes tests called Story Tests, Customer Tests, and/or Acceptance Tests.

One of Cucumber’s most compelling features is that it provides the ability to write these descriptions using plain text in your native language. Cucumber’s language, Gherkin, is usable in a growing variety of human languages, including LOLZ. The advantage of this is that these feature descriptions can be written and/or understood by non-technical people involved in the project.

One important thing to keep in mind is that Cucumber is NOT a replacement for RSpec, test/unit, etc. It is not a low level testing/specification framework.

Cucumber plays a central role in a development approach called Behaviour Driven Development (BDD).

A Bit About BDD

Dan North describes BDD as “writing software that matters” [in The RSpec Book] and outlines 3 principles:

  1. Enough is enough: do as much planning, analysis, and design as you need, but no more.
  2. Deliver stakeholder value: everything you do should deliver value or increase your ability to do so.
  3. It’s a behavior: everyone involved should have the same way of talking about the system and what it does.

BDD in its grandest sense is about communication and viewing your software as a system with behaviour. BDD tools such as RSpec and Cucumber strive to enable you to describe the behavior of your software in a very understandable way: understandable to everyone involved. (more…)

Share this post:
  • email
  • Digg
  • del.icio.us
  • Reddit
  • Slashdot
  • StumbleUpon
  • Technorati
  • Twitter
  • Google Bookmarks
  • Facebook
  • LinkedIn
Popularity: 5% |
Rate this post: 1 Star2 Stars3 Stars4 Stars5 Stars
Loading ... Loading ...

A Quick Primer on Sharding for Ruby on Rails

By Greg Nokes | June 18th, 2009 at 7:06AM

Sharding is usually the final strategy to reach for when scaling a Ruby on Rails app: caching, offloading, and data segmentation are usually the first strategies to implement when scaling your application (they’re usually easier).

It probably sounds obvious, but it’s always important to find out what part of your application needs help before you start re-architecting. If you’re having issues with your database, and you build a spiffy disk sharding scheme, you’ve just fixed a problem that doesn’t exist. So, doing the proper discovery will allow you to allocate your efforts for best effect.

Finding your performance hotspots is very important in this process. A hotspot is a point in the architecture where you’re running at high percentages of capacity, or where your application is spending a lot of time. Hotspots are where the flames start. Knowing your points of pain allow you to triage correctly, and to know how to best spend your developers’ time. Using a combination of resource monitoring (like nagios) and performance introspection (like New Relic) is essential to identifying your Ruby on Rails hotspots.

One of the things to keep in mind is that this process is ongoing. When you clear out one hotspot generally another one will pop up to take its place as you grow. You might be optimizing disk reads and writes one week, and be neck deep in a SQL re-write the next.

If you have a proper staging setup, you can build estimates against generated traffic. This can give you a (blurry) view into what the next hotspot might be. A good process is to capture an hour or so worth of traffic on the live site, and replay it two, three, or more times faster against the staging environment. You want the traffic to be as real as possible. You can even go further and do a formal load test using a tool like browsermob.

When to go deep, and when to go long

After you have killed all of the hotspots you can, and added all the resources you can afford, it’s time to look at the next level. Usually this is when you start to see people thinking about sharding of some manner. There are three major types of sharding at the moment – File System, Database and Application. I’ll touch on each of these topics, starting with the highest level, and hardest.

Ruby on Rails Application Sharding?

Application sharding is the most extreme, provides the most benefit and is the hardest to accomplish. There are several ways to accomplish application sharding.

If you can split your users amongst several vertical groups, you can basically install copies of the application for each segment. This method assumes that users in each group will not need to interact.

For example, if you can segment your user base into three groups who do not really interact, you can simply provision 3 environments and install 3 separate copies of the application. An example of this might be a site hosting application. Each site hosted will not need much (if any) interaction with the other sites hosted. This is by far the easiest method of sharding your application.

You can also look at abstracting any shared logic into a back end service accessible via API. The rule of thumb there is to have each back end application do one thing, and do one thing very quickly. Service oriented architectures (SOA) get this by design.

Alternatively, you can also look at this from a business logic viewpoint. If you can cut your application into portions (say, photos, chat and games for a social site) you can create smaller applications to handle photos, chat and games as well as the shared authentication and user information storage parts. Have the photos, chat and games applications leverage the back end authentication and user information applications to read and write shared information.

This gives us several advantages. For the back end application you can remove all unneeded code (i.e., if you are not going to need provide views, then remove ActionView), plugins and gems. Keep the app as light as possible, and give each of the application shards on dedicated resources (i.e., their own databases).

Another advantage of this approach is that you can start to optimize your hardware spend. If your chat application is 1/2 as intensive as your photo and games applications, it’s far easier to assign resources in a targeted fashion and maximize returns. In a monolithic application, if the photo application breaks, or needs more resources the entire stack is affected. With sharding, you get some buffering from some site wide issues, and the ability to assign resources exactly where they’re needed. The big drawback is that it’s not easy.

Database Sharding

This is another step that can be looked at in certain circumstances. If the amount of data you need to process is so large, or the number of transactions is sinking your Database, you can look into database sharding. Basically, you take your database and break the schema up among several Database servers. There are tools in most major RDBMS’s which will allow you to take care of this. Informing the application where the data is might be complex depending on which RDBMS you use.

Filesystem Sharding

If your application is file system IOPS heavy, file system sharding might be the route that you want to look at. Basically you add more hardware disk arrays, and split the reads and writes between them. You need to inject some logic into the save and open functions in your application so that it knows which file system each file is to be saved to and opened from. Usually you can create a hash of the file name, and key off the first couple of characters in the hash. If you’re interested, you can read our more detailed dive into file system sharding.

That’s No Moon!

Scaling can be a daunting task if you put it off too long. It can mean the difference between a successful business and one that dies. Don’t let that scare you however. Taken in small, bite sized chunks it’s certainly an achievable goal. Make sure that you are working on the right problems, and make sure that you are doing a little throughout the lifespan of your application.

And keep in mind, Scaling is a Discipline, not a Goal. What works great for 20 users:

users = User.find(:all)
for user in users
if user.name = "fred"
user.make_happy
end
end

may not work as well with 2000, or 20,000. So do the work it takes to make your application work today, and keep in mind the changes you’ll have to make, and the challenges you’ll face tomorrow.

Share this post:
  • email
  • Digg
  • del.icio.us
  • Reddit
  • Slashdot
  • StumbleUpon
  • Technorati
  • Twitter
  • Google Bookmarks
  • Facebook
  • LinkedIn
Popularity: 1% |
Rate this post: 1 Star2 Stars3 Stars4 Stars5 Stars
Loading ... Loading ...

Getting Started With JRuby

By Engine Yard | June 15th, 2009 at 9:06AM

In the wake of our recent announcement of JRuby support, we have a guest post from Charlie Nutter of the JRuby team on getting starting with JRuby:

“Last week, Engine Yard announced they would soon support running JRuby in their cloud environment. I think I speak for the whole JRuby community when I say how excited we are about this new possibility. JRuby has proven itself a top-notch, production-quality Ruby implementation, and the Engine Yard announcement really made us feel proud of what we’ve accomplished. It also got us thinking about what JRuby really means for Engine Yard customers.

JRuby is, simply put, Ruby on top of the Java virtual machine. While this means you get the benefits of the JVM’s world-class garbage collectors, libraries, and optimizations, it does not mean you have to know Java to use JRuby. We’ve worked very hard to make JRuby look and feel “just like Ruby.” So much so, that these days basically all pure-Ruby libraries should “just work” out of the box. Rails runs great, and there’s dozens of production users out there reaping the benefits of JRuby’s outstanding memory management, native threads (actually running in parallel!), and excellent performance…all of which we continue to improve with every release. JRuby at Engine Yard means you’ll also be able to take advantage of Engine Yard Ruby and Rails expertise, along with the assurances that your application will “just work” in their cloud.

So how do you get started with JRuby? Easy!

  • Download JRuby from http://www.jruby.org. JRuby 1.3.0 is the current release, but you can feel comfortable testing out either 1.3.0 or 1.2.0: the previous release several folks already have in production.
  • Unpack it somewhere convenient. You don’t have to install it as root, but you can if you like. And you can have as many separate JRuby installs as you want, alongside any standard Ruby installs already on your system.
  • Put JRuby’s “bin” directory somewhere in your PATH, so you can run the “jruby” command easily.

That’s it! You’re ready to try it out!

(more…)

Share this post:
  • email
  • Digg
  • del.icio.us
  • Reddit
  • Slashdot
  • StumbleUpon
  • Technorati
  • Twitter
  • Google Bookmarks
  • Facebook
  • LinkedIn
Popularity: 1% |
Rate this post: 1 Star2 Stars3 Stars4 Stars5 Stars
Loading ... Loading ...