Blog

Archive for July, 2009

Programming Contest! Win iPhone 3GS & $2,000 Cloud Credit

By | July 14th, 2009 at 9:07AM

We’re kicking off a programming contest today that is sure to challenge even the most comp-sci heavy engineers out there, and we’re excited to see what you all come up with. With the difficulty of the challenge in mind, we’ve got some great prizes for the winner: an iPhone 3GS AND $2,000 of Cloud (Flex or Solo) credit. Now to jump right in and answer all your questions…

What is the contest?

You must tweet a sequence of twelve words that when hashed is bit-wise closest to a hash of a challenge phrase that we will announce the morning of July 20th.  All words must be from a 1,000 word dictionary we will provide at that same time. You are allowed to append up to five random characters to the end of your entry. We’re pretty confident you’ll want to write a program to automate the finding of close matches, so announcing this a week in advance should give you enough time to get your programs up and running.

How do I enter?

To enter the contest, follow @engineyard on Twitter and tweet your best candidate word sequence before 6pm Pacific Time on July 21st. This means you have about 30 hours between the availability of the challenge phrase and dictionary, and the entry submission cut-off time.

As previously mentioned, the winner of the contest will get an iPhone 3GS AND $2,000 of Cloud (Flex or Solo) credit). [You can also choose an alternative of load test credits worth $2,000 from browsermob]. Second prize is another iPhone 3GS.

So how does it work exactly? (Update! Example Now Clearer!)

Let’s take an example: a dictionary excerpt is: “Cloud, Ruby, DHH, one, eight, six, active, record, controller, data, rspec, mongrel, MySQL, postgresSQL, tokyo, MRI, jruby, rubinius, memcached, exception, metaprogramming, reflection.” Let’s also say we announce that the challenge phrase is I am not a big believer in fortune telling

To submit a contest entry, you would follow us on Twitter and tweet your best entry, e.g:
“@engineyard Rubinius one eight six active active record memcached exception JRuby DHH TOKYO sdfe3″

We will take the SHA-1 hash of this phrase: Rubinius one eight six active active record memcached exception JRuby DHH TOKYO sdfe3 which hashes to cd36b6dc8d4ed51b36dd7fce08f500392a7fb782 and compare it to the SHA-1 hash of I am not a big believer in fortune telling (which hashes to: 6cac827bae250971a8b1fb6e2a96676f7a077b60).

When we say “compare,” we mean that we will take the Hamming distance between the two hashes; the sum of the count of dissimilar bits when the hex hashes are converted to binary.

For example, here the binary of cd36...etc. is:
1100110100110110...etc.
and the binary of 6cac...etc. is:
0110110010101100...etc.

So calculating the Hamming distance is done as follows:
- first two bits (1 vs 0) don't match -> +1 to Hamming distance
- second two bits (1 vs. 1) do match -> no change to Hamming distance
- third two bits (0 vs. 1) don't match -> +1 to Hamming distance

etc.

In the case of the complete example hashes above, the Hamming difference is 74. If you are the submitter with the lowest Hamming distance, you win the prizes – it’s that simple ;)

Extra Prize: If you manage to achieve a Hamming distance of zero, we’ll throw in a MacBook Pro: you are either highly improbable, have mad algorithm cracking skills or you work for the NSA, any of which makes you cool enough to deserve random goodness and recognition. Note: we know the probability of anyone getting to a zero Hamming distance is truly vanishingly small, but we wanted to acknowledge anyone making it there!

There are some obvious brute force strategies to win this prize. We’d suggest building a really fast word permutation algorithm, and finding a fast SHA-1 hash algorithm. Then find a way to get your hands on a whole bunch of computation for the 24 hours that the contest will run (hmm… perhaps the cloud would be useful).

More details and conditions:

  1. Only US ASCII printable characters in your custom five character string please (we really want to avoid Unicode rat-holes)
  2. The words in your string must be single space separated; no other punctuation is allowed
  3. Spamming new entries as you find better ones is a fail whale; limit your contest tweets to a maximum of five
  4. The dictionary will be a Macintosh TextEdit file in RTF format, with each word on a separate line, and no white-space.
  5. In the case of a tied Hamming distance (entirely possible), the winner will be chosen by lottery among people with the best distance
  6. You may permute capitalization for the dictionary words (i.e. you may use Ruby, rUby, RUBY, and RUBy)
  7. Please scrub your custom five character string for the five words you can’t say on television
  8. If the exact same string is submitted multiple times, only the first submission counts
  9. Employees and contractors of Engine Yard, and their family members are not eligible

Okay, so maybe “It’s that simple” was a bit, well, over-simplified, but we’re confident we’ll get some great submissions. The Ruby community is nothing if not persistent, creative and intelligent — so show us what you’ve got!

Miscellaneous clarifications

1) When you convert the hexadecimal hash to binary — you need to convert the hex to the equivalent number in binary e.g. “c” = “12″ in decimal = “1100″ in (big endian) binary. Make sure that your binary conversion function is NOT treating “c” as ASCII letter “c” — which gets you the completely different answer of “63″ in decimal or “01100011″ in binary.
2) Be careful if you end up using string hashing functions in C. Remember that C strings are null terminated, and from reports we’re getting, at least some string functions out there take the Null string terminator ( “�”) as an input to the hash function. This will get you in trouble because we will not be including a null terminator when we calculate the hash of your tweets. Naturally, we will treat your tweet as a (sane) Ruby string.

Another example for folks

People have asked for another example to test their hashing algorithms so here it is:

Example challenge phrase #2:
What you write today will become legacy
which hashes to:
7f83e6b422af5ca4e3112486aea3e702e98a894e or in hex to binary (big-endian):
0111 1111 1000 0011 1110 0110 1011 0100 0010 0010 1010 1111 0101 1100 1010 0100 1110 0011 0001 0001 0010 0100 1000 0110 1010 1110 1010 0011 1110 0111 0000 0010 1110 1001 1000 1010 1000 1001 0100 1110

Example contest entry tweet #2:
@engineyard RuBy one eight six rspec mongrel MRI jruby jruby memcached exception reflection utf8E

We will take the hash of RuBy one eight six rspec mongrel MRI jruby jruby memcached exception reflection utf8E
which hashes to:
075a32acb1816b570607189475ebbbaccce8b79f or in hex to binary (big-endian):
0000 0111 0101 1010 0011 0010 1010 1100 1011 0001 1000 0001 0110 1011 0101 0111 0000 0110 0000 0111 0001 1000 1001 0100 0111 0101 1110 1011 1011 1011 1010 1100 1100 1100 1110 1000 1011 0111 1001 1111

The hamming distance between these two hashes (again, remember treating the hash as HEXADECIMAL, NOT ASCII) is 80.

Example dictionary file: sampledictionary

Popularity: 3% |

Evolving Rails: Retaining Backward Compatibility

By | July 13th, 2009 at 10:07AM

When evolving a codebase, there are two kinds of changes. The first is an innocuous internal change that nobody else is relying on — aka a true refactoring as per Martin Fowler’s canonical definition. The second is a public-facing change that will impact many others.

When I say “others,” I am referring both to other parts of your own codebase as well as other users entirely. In a web app, your public API is usually the set of URLs that users can use in order to interact with your application. It might also include a client library that you distribute in order to facilitate the use of your API.

In a library or plugin, your public API is the set of methods that applications use to make use of the functionality you expose. In the case of Rails, our public API is made up of methods like before_filter, render, and when_fresh.

Version and Document Your Changes

As with Rails itself, you want to make changes to your public APIs in explicit versions. In the case of APIs that are made up of publicly available URLs, this means versioning your API URLs. (John Barnette has a very good write-up on this.)

If you’re releasing a library, be it a gem or a Rails plugin, release a new minor version (1.2) of your library if it contains changes to the public API. Release a new major version (2.0) of your library if it breaks existing APIs.

If You Must Break Compatibility, Deprecate First

Ruby is an extremely flexible language. If you must break backward compatibility, first release a minor version that supports both APIs, marking the old one as deprecated, then remove the deprecated API in a future version.

This is something Rails has done reasonably well for a number of years. For example, Rails deprecated support for templates with .rhtml extensions several releases ago, replacing them with templates ending with .html.erb. Both options were supported, but Rails 3 (a major release) will finally remove support for the old API.

In terms of URL APIs, you can achieve this by keeping around support for older versions of the API, but phasing out versions as you move forward. For instance, it might be ok to drop support for version 1 of such an API when version 3 is released.

Make sure the users of your code receive deprecation warnings when they use the older API, so they are aware that they are using code with a limited shelf-life.

Set Expectations Correctly

If you have a public API, make sure your users know the approximate schedule for expected changes. If you plan to release new versions of the API every few months, and remove support for versions that are more than two versions old, communicate that with your users. That way, they’ll know when to build in time to keep their code humming along.

In the case of a library, make sure users know what to expect when they upgrade to a new version. Rails typically does not break backward compatibility with the public-facing API in minor versions, although it does add deprecation warnings. Rails 1.0, 2.0 and the upcoming 3.0 made and will make larger changes, mostly by removing support for APIs that were deprecated in previous releases.

Make Use of Ruby’s Flexibility

Ruby is a very powerful and flexible language, and allows you to easily support legacy APIs in an isolated way. For Rails 3.0, we were pretty sure we would need to break a non-trivial amount of backwards compatibility (after all, ActionController on Rails edge is mostly a new codebase). It turned out, however, that we were able to maintain backwards compatibility for all but the most obscure cases.

For instance, we have removed support for render :layout => nil, which was a synonym for render :layout => true in Rails 2.x, but the opposite of render :layout => false.

However, we have maintained support for things like removing the leading slash in render :template => "/foo" and making render :layout => "layouts/foo" the equivalent of render :layout => "foo".

In order to keep the changes isolated, we created a new Compatibility module which is included in ActionController::Base by default but can be opted-out-of to remove features that we plan to remove. An example:

def _find_layout(name, details)
  details[:prefix] = nil if name =~ /blayouts/
  super
end

In this case, the Compatibility module overrides the default _find_layout method, and handles the case where the user included layouts in the layout they specified. We then call super, which runs the original behavior after we’ve normalized the data.

The key takeaway is that one way of handling backwards compatibility is to use Ruby’s flexibility to normalize the old functionality into the new functionality in an isolated place. In the case of Rails, we know that we can remove the Compatibility module wholesale when we next do a major version bump.

For HTTP APIs, use External, not Internal Redirects

It can be tempting to simply use Rails’ router to point URLs in transition to their new location. However, this will not inform clients that the URL they are using has changed. If you use a 301 status code (Moved Permanently), browsers and other clients have an opportunity to learn about the new location and behave appropriately.

Think about a 301 status as the equivalent of a deprecation warning for HTTP APIs, allowing clients to take corrective action before you take the final step of removing support for the URL entirely.

If you provide a client library for your API (as you should), print a deprecation notice when your server returns a 301 redirect. This gives you the ability to warn your users that they will need to update their client by making a change to the server only.

Popularity: 1% |

Cucumber: More Advanced

By | July 9th, 2009 at 6:07AM

In a previous post, I gave you some introductory information on Cucumber, a great framework for writing and executing high level descriptions of your software’s functionality. In this post, I’ll take a deeper dive and talk about a few more advanced Cucumber topics: project structures, multiple language support, scenario tables, free-form stories, tags, hooks and backgrounds. As always, for more detailed information see the documentation and/or The RSpec Book.

Project Structure

Let’s start by taking a look at your project structure: the usual advice is to have a features directory as the root of your Cucumber work. In that directory, you place all of your .feature files which contain your features (as you would expect) as well as support and step_definitions directories.

The support directory should contain whatever support code your features need, and an env.rb file which is responsible for loading any required code that lives outside the feature directory tree.

In the step_definitions directory, you place the files (with .rb extensions) that contain your step definitions. These will all get loaded when your features run.

You can have multiple subdirectories in the features directory for grouping features. This allows you to run the features in a particular directory. While this can be useful, it can be awkward in practice. That’s because (as of this writing) cucumber loads each ruby file (ending in .rb) it finds in the directory you tell it to run and, recursively, all subdirectories. For example, consider the following tree (each .rb files simply has a puts "x" where x is the name of the file):

features
   +- 1.rb
   +- 2.rb
   +- sub1
      +- 3.rb
   +- sub2
      +- 4.rb
      +- sub3
         +- 5.rb
         +- sub4
            +- 6.rb

So when you run cucumber features, you get:

1
2
3
4
5
6
0 scenarios
0 steps
0m0.000s

but, if you run cucumber features/sub2 you get:

4
5
6
0 scenarios
0 steps
0m0.000s

The issue is that features don’t inherit support code or steps from parent directories unless that parent is also visited by Cucumber. So if you want to run subdirectories separately and they share setup code or steps, you have to somehow duplicate that code (possibly by explicitly requiring files from up the tree). This isn’t significant, but it is a bit messy. A better approach might be to use tags (described below). If your features groups don’t share much, then using subdirectories can work fine.

(more…)

Popularity: 10% |

5 Ways to Speed Up Your Rails App

By | July 6th, 2009 at 9:07AM

Ruby is a fast language, and a great one in so many ways, but nothing in this world is truly free. It’s very easy to do things that seem inconsequential but that later can bring your application to a grinding halt. In this post, I’ll outline five important ways that you can avoid some of the most common problems Rails apps encounter.

Before continuing, a disclaimer: do not take these tips and refactor your code ad-hoc. Take everything with a grain of salt and perform your own measurements to determine which pieces of your app are slow. Before making any performance optimizations, get set up with a profiling tool, like RubyProf, New Relic, Scout, etc. You always want to know where the most significant bottlenecks are for you, and focus your efforts there first.

Eager Load Associations

The most common and significant problem that I’ve seen in Rails apps has been the lack of eager loaded associations. A simple extra _:include_ when performing ActiveRecord finds will prevent 1+N queries. So for example, if you are displaying a list of articles on your blog homepage and want to display the author’s name as well, load the posts with Post.all(:include => :author). For those complex pages, eager loading works multiple levels deep. Newer versions of ActiveRecord handle complex eager loading cases much more elegantly by splitting up a large join query into multiple smaller queries that make better sense.

Note: only perform the eager load when you actually plan to use the objects, because there’s fairly significant overhead to creating many ActiveRecord objects.

Do Database Work In the Database

In the same vein as the first tip, try leveraging the database when it makes sense. Relational databases are designed to query large amounts of data and return results; Ruby is not.

For example, if you want to check if the user currently logged in has commented on an article, you don’t need to load all the comments for that article. Iterate through each one, and check whether at least one comment was created by the current user. Doing this will instantiate objects for every single comment and then instantly discard them after the check is done. A much better way to obtain the same result is to push the logic to the database by doing a SELECT COUNT statement. ActiveRecord has an easy way to do this: Article.comments.count(:conditions => ["user_id = ?", current_user.id]) > 0

Do as Little as Possible During the HTTP Request Cycle

You want to be able to return a response to the end user’s request as quickly as possible, so only do the bare minimum needed to return the response and defer everything else. Actually sending out an email is relatively slow and users don’t generally care if emails are sent during the request cycle or right after.

Whether this is implemented using a simple Ruby thread or a robust, distributed queuing system like RabbitMQ doesn’t really matter. Rails 3 will ship with a default queuing system, but until then, I suggest checking out DelayedJob and BackgroundJob.

Know Your Gems and Plugins

As Rails applications get more complicated, a good thing to do is to use existing plugins and gems instead of recreating the work in house. This usually introduces a significant amount of new code to the application that is relatively unknown.

There are many great Rails plugins out there. But before depending on a new gem or plugin, I suggest at least skimming the source — check for any craziness. Also be sure you’re using plugins for their intended purposes — or things are likely to go awry.

Avoid Creating Unnecessary Objects

Every time Ruby’s garbage collector is triggered, Ruby will stop running your code and start cleaning up unused objects. This process can take between 100 and 400ms on MRI (JRuby has a better behaved, tunable garbage collector through the JVM), which is a noticeable period of time. Avoid this as much as possible. This means avoid creating unnecessary objects. I have already mentioned a couple of ways to do this in the previous tips.

In general, the best way to avoid the unnecessary creation of objects is to understand how Ruby and the libraries in use work. For example, understand the difference between these two snippets:

sentences.map { |s| s.strip }
sentences.each { |s| s.strip! }

The first snippet creates a new Array object and a new String object for each element in the Array. The second snippet just mutates the String objects in the Array without creating new Ruby objects.

Granted, this tip only makes a significant difference when dealing with large data structures, but it’s a good idea to keep in the back of your mind whether or not you actually need to duplicate objects. If you have arrays containing thousands of ActiveRecord objects and use reject vs. reject!, you’ve just created a second array which could potentially have thousands of objects.

There are many other aspects of a Ruby on Rails application that can cause bottlenecks; listing them all is obviously impossible. That said, the most important thing to learn is how to locate these bottlenecks. Solving them can be handled on a case by case basis.

Popularity: 2% |

Rubinius: The Book Tour

By | July 2nd, 2009 at 9:07AM

This year continues to be a hot one for the Ruby programming language. The use of Ruby is growing, excitement is mounting for the release of Rails 3.0, and development of Ruby 1.9 and the alternative implementations is moving along quickly. It makes sense: bringing more value to your customers in less time with fewer resources is an obvious plus, and Ruby’s a great way to make that happen.

Rubinius, which you’ve no doubt heard lots about over the last few years, is an implementation of the Ruby language written from scratch using cutting edge technology and the best industry research. Based on the questions we’ve received over the past few months, it’s clear that a lot of folks are looking to learn more about the technologies behind the project. This is exciting because with so much written in Ruby, Rubinius positively begs Ruby developers to experiment and explore.

In this post I’ll describe each of the basic parts of Rubinius, and provide some helpful links to books that I’ve found particularly useful in understanding how Rubinius is built.

(more…)

Popularity: 2% |