• Sales: (866) 518-YARD

The Boys and Girl of Summer

By Tom Mornini | August 6th, 2010 at 12:08PM

Mark Twain once said “The coldest winter I ever saw was the summer I spent in San Francisco.” While the micro-climate of SOMA (South of Market), the area where Engine Yard HQ is located, has considerably more sun than the famously foggy western side of town, the winds of change are definitely blowing through here this summer.

Tammer Saleh joined us 3 weeks ago as Director of Application Development. Most recently Tammer operated his own consultancy practice, and he is a well known and respected member of the Ruby community. He’s already identified a number of quick wins that will continue the rapid fire development of AppCloud. While the AppCloud team has been absolutely killing it, I have confidence that Tammer’s skills and techniques will further press the pace.

As we continue to grow, we felt the time was right to reaffirm our commitment to open source.  We don’t want newcomers to the community to think we’re a mere commercial entity, as opposed to the open source symbiote that long time members know us to be! Today I’m announcing two hires that will, I believe, make our commitment abundantly clear.

Dr. Nic Williams will be arriving from Australia to take the role of VP of Technology. His primary responsibility will be to organize and guide Engine Yard’s open source efforts. He has already blogged about his pending move; perhaps I should have left off the Mark Twain quote? Thank you for your sacrifice, Mrs. Dr. Nic! Hopefully you and my wife, Elizabeth, will become fast friends! I find San Francisco to be a friendly and wonderful place to live and suspect you will too! :-)

Roger Levy will be joining us later in the month to oversee engineering, support and product management in his role as SVP of Products. Roger, who managed the SUSE Linux business at Novell, certainly has the open source experience and credentials to continue to reinforce Engine Yard’s commitment to open source.

Finally, we’ve also added Sara Gardner as VP of Marketing and Steve Gross as VP of Business Development. There are so many things to inform the community about, and so many great companies to partner with, that Sara and Steve are already busy! I welcome them to Engine Yard and anxiously anticipate their unique contributions.

Startups that grow quickly place a LOT of stress on their founders and early employees. Many founders thrive on this, I know that Lance and I did. Finally, as a company grows, priorities and roles change. I won’t argue with those who say that staying small is beautiful, as I agree with much of what has been said on that subject. We, however, chose a different path: go big or go home! :-)

Perhaps, then, it should not be a surprise that Ezra and his family have decided to move to Portland to be closer to family. I wish my good friend a very fond farewell, and a reduction of the stress that he has endured along with the rest of the founders and early employees of Engine Yard.

I cannot express how exciting and fulfilling it has been to steer Engine Yard over the last 4 years, from an early advocate of Ruby on Rails and its community, to the force that it is today. As the last remaining founder, I must admit that I’m very proud of what we have achieved during that time, both at Engine Yard and beyond! Lance Walley is now CEO of Chargify, an uber-cool recurring billing service. Jayson Vantuyl has created a successful consulting business and is up to something sneaky as well! And while only time will tell what Ezra shall choose to pursue next, I’m certain that additional success awaits him.

Finally, I’d like to close with something that I wish I could say every time I open my mouth: Thank you to our nearly 1,500 customers and all of my hard working, talented and dedicated employees. Perhaps the highest praise one person can offer another is “you make my dreams possible” and for making mine possible, I’ll forever be grateful to each and every one of you. :-)

Share this post:
  • email
  • Digg
  • del.icio.us
  • Reddit
  • Slashdot
  • StumbleUpon
  • Technorati
  • Twitter
  • Google Bookmarks
  • Facebook
  • LinkedIn
Popularity: 13% |
Rate this post: 1 Star2 Stars3 Stars4 Stars5 Stars
Loading ... Loading ...

Monitoring Memory with JRuby, Part 1: jhat and VisualVM

By Charles Oliver Nutter | August 4th, 2010 at 11:08AM

There’s been a lot of fuss made lately over memory inspection and profiling tools for Ruby implementations. And it’s not without reason; inspecting a Ruby application’s memory profile, much less diagnosing problems, has traditionally been very difficult. At least, difficult if you don’t use JRuby.

Because JRuby runs on the JVM, we benefit from the dozens of tools that have been written for the JVM. Among these tools are numerous memory inspection, profiling, and reporting tools, some built into the JDK itself. Want a heap dump? Check out the jmap (Java memory map) and jhat (Java heap analysis tool) shipped with Hotspot-based JVMs (Sun, OpenJDK). Looking for a bit more? There’s the Memory Analysis Tool based on Eclipse, the YourKit memory and CPU profiling app, VisualVM, now also shipped with Hotspot JVMs…and many more. There’s literally dozens of these tools, and they provide just about everything you can imagine for investigating memory.

In this post, I’ll show how you can use two of these tools: VisualVM, a simple, graphical tool for exploring a running JVM; and the jmap/jhat combination, which allows you to dump the memory heap to disk for inspection offline.

Getting JRuby Prepared

All these tools work with any version of JRuby, but as part of JRuby 1.6 development I’ve been adding some enhancements. Specifically, I’ve made some modifications that allow Ruby objects to show up side-by-side with Java objects in memory profiles. A little explanation is in order.

In JRuby, all the core classes are represented by “native” Java classes. Object is represented by org.jruby.RubyObject, String is org.jruby.RubyString, and so on. Normally, if you extend one of the core classes, we don’t actually create a new “native” class to represent it; instead, all user-created classes that extend Object simply show up as RubyObject in memory. This is still incredibly useful; you can look into RubyObject and see the metaClass field, which indicates the actual Ruby type.

Let’s see what that looks like, so we know where we’re starting from. We’ll run a simple script that creates a custom class, instantiates and saves 10000 instances of it, and then sleeps.

~/projects/jruby ➔ cat foo_heap_example.rb
class Foo
end

ary = []
10000.times { ary << Foo.new }

puts "ready for analysis!"
sleep

~/projects/jruby ➔ jruby foo_heap_example.rb
ready for analysis!

So we have our test subject ready to go. To use the jmap tool, we need the pid of this process. Of course we can use the usual shell tricks to get it, but the JDK comes with a nice tool for finding all JVM pids active on the system: jps

~/projects/jruby ➔ jps -l
52862 sun.tools.jps.Jps
52857 org/jruby/Main
48716 com.sun.enterprise.glassfish.bootstrap.ASMain

From this, you can see I have three JVMs running on my system right now: jps itself; our JRuby instance; and a GlassFish server I used for testing earlier today. We’re interested in the JRuby instance, pid 52857. Let’s see what jmap can do with that.

~/projects/jruby ➔ jmap
Usage:
    jmap [option] <pid>
        (to connect to running process)
    jmap [option] <executable <core>
        (to connect to a core file)
    jmap [option] [server_id@]<remote server IP or hostname>
        (to connect to remote debug server)

where <option> is one of:
    <none>               to print same info as Solaris pmap
    -heap                to print java heap summary
    -histo[:live]        to print histogram of java object heap; if the "live"
                         suboption is specified, only count live objects
    -permstat            to print permanent generation statistics
    -finalizerinfo       to print information on objects awaiting finalization
    -dump:<dump-options> to dump java heap in hprof binary format
                         dump-options:
                           live         dump only live objects; if not specified,
                                        all objects in the heap are dumped.
                           format=b     binary format
                           file=<file>  dump heap to <file>
                         Example: jmap -dump:live,format=b,file=heap.bin <pid>
    -F                   force. Use with -dump:<dump-options> <pid> or -histo
                         to force a heap dump or histogram when <pid> does not
                         respond. The "live" suboption is not supported
                         in this mode.
    -h | -help           to print this help message
    -J<flag>             to pass <flag> directly to the runtime system

<

The simplest option here is -histo, to print out a histogram of the objects on the heap. Let’s run that against our JRuby instance.

~/projects/jruby ➔ jmap -histo:live 52857

 num     #instances         #bytes  class name
----------------------------------------------
   1:         22677        3192816  <constMethodKlass>
   2:         22677        1816952  <methodKlass>
   3:         35089        1492992  <symbolKlass>
   4:          2860        1389352  <instanceKlassKlass>
   5:          2860        1193536  <constantPoolKlass>
   6:          2798         739264  <constantPoolCacheKlass>
   7:          5861         465408  [B
   8:          5399         298120  [C
   9:          3042         292032  java.lang.Class
  10:          4037         261712  [S
  11:         10002         240048  org.jruby.RubyObject
  12:          3994         179928  [[I
  13:          5474         131376  java.lang.String
  14:          1661          95912  [I
...

The resulting output is a listing of literally every object in the system...not just Ruby objects even! The value of this should be apparent; not only can you start to investigate the memory overhead of code you've written, you'll also be able to investigate the memory overhead of every library and every piece of code running in the same process, right down to byte arrays (the "[B" above) and "native" Java strings ("java.lang.String" above). And so far we haven't had to do anything special to JRuby. Nice, eh?

So, back to the matter at hand: the Foo class from our example. Where is it?

Well, the answer is that it's right there; 10000 of those 10002 org.jruby.RubyObject instances are our Foo objects; the other two are probably objects constructed for JRuby runtime purposes. But obviously, there's nothing in this output that tells us how to find our Foo instances. This is what I'm remedying in JRuby 1.6.

On JRuby master, there's now a flag you can pass that will stand up a JVM class for every user-created Ruby class. Among the many benefits of doing this, we also get a more useful profile. Let's see how to use the flag (which will either be default or very easy to access by the time we release JRuby 1.6).

~/projects/jruby ➔ jruby -J-Djruby.reify.classes=true foo_heap_example.rb
ready for analysis!

If we run jmap against this new instance, we see a more interesting result.

 num     #instances         #bytes  class name
----------------------------------------------
   1:         22677        3192816  <constMethodKlass>
   2:         22677        1816952  <methodKlass>
   3:         35089        1492992  <symbolKlass>
   4:          2860        1389352  <instanceKlassKlass>
   5:          2860        1193536  <constantPoolKlass>
   6:          2798         739264  <constantPoolCacheKlass>
   7:          5863         465456  [B
   8:          5401         298208  [C
   9:          3042         292032  java.lang.Class
  10:          4037         261712  [S
  11:         10000         240000  ruby.Foo
  12:          3994         179928  [[I
  13:          5476         131424  java.lang.String
  14:          1661          95912  [I

A-ha! There's our Foo instances! The "reify classes" option generates a JVM class of the same name as the Ruby class, prefixed by "ruby." to separate it from other JVM classes. Now we can start to see the real power of the tools, and we're just at the beginning. Let's see what a simple Rails application looks like.

~/projects/jruby ➔ jmap -histo:live 52926 | grep " ruby."
  29:         11685         280440  ruby.TZInfo.TimezoneTransitionInfo
  97:           970          23280  ruby.Gem.Version
  98:           914          21936  ruby.Gem.Requirement
 122:           592          14208  ruby.TZInfo.TimezoneOffsetInfo
 138:           382           9168  ruby.Gem.Dependency
 159:           265           6360  ruby.Gem.Specification
 201:           142           3408  ruby.ActiveSupport.TimeZone
 205:           118           2832  ruby.TZInfo.DataTimezoneInfo
 206:           118           2832  ruby.TZInfo.DataTimezone
 273:            41            984  ruby.Gem.Platform
 383:            14            336  ruby.Mime.Type
 403:            13            312  ruby.Set
 467:             8            192  ruby.ActionController.MiddlewareStack.Middleware
 476:             8            192  ruby.ActionView.Template
 487:             7            168  ruby.ActionController.Routing.DividerSegment
 508:             6            144  ruby.TZInfo.LinkedTimezoneInfo
 523:             6            144  ruby.TZInfo.LinkedTimezone
 810:             4             96  ruby.ActionController.Routing.DynamicSegment
2291:             2             48  ruby.ActionController.Routing.Route
2292:             2             48  ruby.I18n.Config
2293:             2             48  ruby.ActiveSupport.Deprecation.DeprecatedConstantProxy
2298:             2             48  ruby.ActionController.Routing.ControllerSegment
...

This time I've opted to grep out just the "ruby." items in the histogram, and the results are pretty impressive! We can see the baffling fact that there's 970 instance of Gem::Version, using at least 23280 bytes of memory. We can see the even more depressing fact that there's 11685 live instances of TZInfo::TimezoneTransitionInfo, using at least 280440 bytes.

Now that we're getting useful data, let's look at the first of our tools in more detail: jmap and jhat.

jmap and jhat

As you might guess, I do a lot of profiling in the process of developing JRuby. I've used probably a dozen different tools at different times. But the first tool I always reach for is the jmap/jhat combination.

You've seen the simple case of using jmap above, generating a histogram of the live heap. Let's take a look at an offline heap dump.

~/projects/jruby ➔ jmap -dump:live,format=b,file=heap.bin 52926
Dumping heap to /Users/headius/projects/jruby/heap.bin ...
Heap dump file created

That's how easy it is! The binary dump in heap.bin is supported by several tools: jhat (obviously), VisualVM, the Eclipse Memory Analysis Tool, and others. It's not officially a "standard" format, but it hasn't changed in a long time. Let's have a look at jhat options.

~/projects/jruby ➔ jhat
ERROR: No arguments supplied
Usage:  jhat [-stack <bool>] [-refs <bool>] [-port <port>] [-baseline <file>] [-debug <int>] [-version] [-h|-help] <file>

 -J<flag>          Pass <flag> directly to the runtime system. For
     example, -J-mx512m to use a maximum heap size of 512MB
 -stack false:     Turn off tracking object allocation call stack.
 -refs false:      Turn off tracking of references to objects
 -port <port>:     Set the port for the HTTP server.  Defaults to 7000
 -exclude <file>:  Specify a file that lists data members that should
     be excluded from the reachableFrom query.
 -baseline <file>: Specify a baseline object dump.  Objects in
     both heap dumps with the same ID and same class will
     be marked as not being "new".
 -debug <int>:     Set debug level.
       0:  No debug output
       1:  Debug hprof file parsing
       2:  Debug hprof file parsing, no server
 -version          Report version number
 -h|-help          Print this help and exit
 <file>            The file to read

For a dump file that contains multiple heap dumps,
you may specify which dump in the file
by appending "#<number>" to the file name, i.e. "foo.hprof#3".

All boolean options default to "true"

Generally you can just point jhat at a heap dump and away it goes. Occasionally if the heap is large, you may need to use the -J option to increase the maximum heap size of the JVM jhat runs in. Since we’re running a Rails app, we’ll bump the heap up a little bit.

~/projects/jruby ➔ jhat -J-Xmx200M heap.bin
Reading from heap.bin...
Dump file created Fri Jul 09 02:07:46 CDT 2010
Snapshot read, resolving...
Resolving 604115 objects...
[much verbose logging elided for brevity]

Chasing references, expect 120 dots........................................................................................................................
Eliminating duplicate references........................................................................................................................
Snapshot resolved.
Started HTTP server on port 7000
Server is ready.

“Server is ready”? Damn you Java people! Does everything have to be a server with you?

In this case, it’s actually an incredibly useful tool. jhat starts up a small web application on port 7000 that allows you to click through the dump file. Let’s see what that looks like.

Here’s the front page of the tool. We see a listing of all JVM classes in the system. If you scroll to the bottom, there’s a few more general functions.

Let’s go with what we know and view the heap histogram again.

Here we can see that there’s lots of objects taking up memory, and they’re a mix of JVM-native types, JRuby implementation classes, and actual Ruby classes. In fact, here we can see our friend TZInfo::TimezoneTransitionInfo again. Let’s click through.

Pretty mundane stuff so far; basically just information about the class itself. But you see at the bottom of this screenshot that we can go from here to viewing all instances of TimezoneTransitionInfo. Let’s try that.

Ahh, that’s more like it! Now we can see that there’s a heck of a lot of these things floating around. Let’s investigate a bit more and click through the first instance.

Now this is some cool stuff!

We can see that the JVM class generated for TimezoneTransitionInfo has three fields: metaClass, which points at the Ruby Class object; varTable, which is an array of Object references used for instance variables and other “internal” variables; and a flags field containing runtime flags for the object, like whether it’s frozen, tainted, and so on. We can see that this object has no special flags set, and we can dig deeper into those fields if we like. We’ll skip that today.

Moving further down, we see a few more amazing links. First, there’s a list of all references to this object. Ahh, now we can start to investigate why they’re staying in memory, even though we’re not using them. We can even have jhat show us the full chains of references keeping these objects alive; a series of objects leading all the way back to one “rooted” by a thread or by global JVM state. And we can explore the other direction as well, walking all objects reachable from this one.

This is only a small part of what you can do with jmap and jhat, and they’re so simple to use it feels almost criminal. But what if we want to inspect an application while it’s running? Dumping heaps and analyzing them offline can tell you much of the story, but sometimes you just want to see the objects coming and going yourself. Let’s move on to VisualVM.

VisualVM

VisualVM spawned out of the NetBeans profiling tools. One of the biggest complaints about the JVMs of old were that all the built-in tooling seemed to be designed for JVM engineers alone. Because Sun had the foresight to build and own their own IDE and related modules, it eventually became a natural fit to pull out the profiling tools for use by everyone. And so VisualVM was born.

On most systems with Java 6 installed, you should have a “jvisualvm” command. Let’s run it now.

When you start up VisualVM, you’re presented with a list of running JVMs, similar to using the ‘jps’ command. You can also connect to remote machines, browse offline heap and core dump files, and look through memory and CPU profiling snapshots from previous runs. Today, we’ll just open up our running Rails app and see what we can see.

VisualVM connects to the running process and brings up a basic information pane with process information, JVM information, and so on. We’re interested in monitoring heap usage, so let’s move to the “Monitor” tab.

Already we’re getting some useful information. This view shows CPU usage (currently zero, since it’s an idle Rails app), Heap usage over time, and the number of JVM classes and threads that are active. We can trigger a full GC, if we’d like to tidy things up before we start poking around. But most importantly, we can do the jmap/jhat dance in one step, by clicking the Heap Dump button. Tantalizing, isn’t it?

Initially, we see a basic summary of the heap: total size, number of classes and GC roots, and so on. We’re looking for our friend TimezoneTransitionInfo, so let’s look for it in the “Classes” pane.

Ahh, there it is, just a little ways down the list. The counts are as we expect, so let’s double-click and dig a bit deeper.

Here we have a lot of the same information about object instances that we did with jhat, but presented in a much richer format. Almost everything is active; you can jump around the heap and do analysis that would take a lot of manual work very easily. Let’s try another tool: the Retained Size calculator.

Because our JVM tools see all objects equally, the reported size for a Ruby object on the heap is only part of the story. There’s also the variable table, the object’s instance variables, and objects they reference to consider. Let’s jump to a different object now, Gem::Version.

We don’t want to have to scroll through the list of classes to find ruby.Gem.Version, so let’s make use of the Object Query Language console. With the OQL console, you can write SQL-like queries to retrieve listings of objects in the heap. We’ll search for all instances of ruby.Gem.Version.

The query runs and we get a listing of Gem::Version objects. Let’s dig deeper and see how much retained memory each Version object is keeping alive.

Clicking on the “Compute Retained Sizes” link in the “Instances” pane prompts us with this dialog. We’re tough…we can take it.

Reticulating splines…

So it looks like each of the Version objects take from 125 to 190 bytes for a total of 19400 bytes, most of which is from the variable table. What’s in there?

Ahh…looks like there’s a String and an Array. And of course we can poke around the heap ad infinatum, into and out of “native” JRuby and JVM classes, and truly get a complete picture of what our running applications look like. Now you’re playing with power.

Your Turn

This is obviously only the tip of the iceberg. Tools like Eclipse Memory Analysis Tool include features for detecting leaks; VisualVM and NetBeans both allow you to turn on allocation tracing, to show where in your code all those objects are being created. There’s tools for monitoring live GC behavior, and many of these tools even allow you to dig into a running heap and modify live objects. If you can dream it, there’s a tool that can do it. And you get all that for free by using JRuby.

If you’d like to play with this, it all works with JRuby 1.5.1 but you won’t get the nice JVM classes for Ruby classes. For that, you can pull and build JRuby master, download a 1.6.0.dev snapshot, or just wait for JRuby 1.6. And if you do play with these or other tools, I hope you’ll let us know and blog about your experience!

In the future, I’ll try to show some of the other tools plus some of the CPU profiling capabilities they bring to the table. For now, rest assured that if you’re using JRuby, you really do have the best tools available to you.

This article was originally published on Charles Nutter’s blog Headius.

Share this post:
  • email
  • Digg
  • del.icio.us
  • Reddit
  • Slashdot
  • StumbleUpon
  • Technorati
  • Twitter
  • Google Bookmarks
  • Facebook
  • LinkedIn
Popularity: 8% |
Rate this post: 1 Star2 Stars3 Stars4 Stars5 Stars
Loading ... Loading ...

A Gentle Introduction to Isolation Levels

By Xavier Shay | July 21st, 2010 at 10:07AM

Hello all,

Our latest post is from a special guest and Engine Yard partner Xavier Shay. He’ll be running a pair of training sessions on “using your database to make your Ruby on Rails applications rock solid” at Engine Yard’s San Francisco office on the 24th and 31st of July. Visit www.dbisyourfriend.com for course and registration details.

Bob opens a database transaction and selects everything from the books table. Tom comes along and adds a new book, then Bob, in his same transaction, repeats his same query for all the books. Does Bob see the new book that Tom added?

The answer is that you get to choose! It’s important to understand what your choices are (and what choice your preferred database makes for you) so that you can ensure your code executes in a way that you intend.

The SQL standard specifies levels for how “isolated” transactions running at the same time are, all the way from being able to see uncommitted changes (not isolated) to effectively running the transactions in serial (full isolation). Academically there are eight levels of isolation, but for most purposes you only need to worry about the four defined by the standard. MySQL implements all four, PostgreSQL only two. You can specify a global isolation level for your database, but also override it for individual transactions.

The easiest to understand are the extreme levels: no isolation and total isolation. The first of these is known as read uncommitted, and it allows Bob to read the new book that Tom is adding even before Tom has committed his changes. As you can imagine this level is mostly useless, however it can very occasionally be handy in some reporting situations.

At the other end of the spectrum is full isolation, known in the spec as serializable. Bob will never see the new book that Tom is adding until he starts a new transaction. The database Bob sees is consistent—within the one transaction, the same query will always return the same result. At first glance this level seems like a great option but there’s a lot of overhead involved, it drastically reduces the amount of concurrency you can achieve, and for most purposes the serializable level is overkill.

There are two isolation levels in between read uncommitted and serializable, they are read committed and read repeatable, and this is where it gets interesting. Read committed is the default isolation level in PostgreSQL and Oracle, and is one step up from read uncommitted. It is the most “common sense” level: Bob will not see any changes made by Tom until Tom commits them.

MySQL defaults to read repeatable. In this level, Bob will not see any updates Tom commits, but will see any inserts. Say in Bob’s first select he sees one book titled “The Odessey”. Tom then fixes the spelling mistake to “The Odyssey”, and also add Homer’s other epic poem “The Iliad”. When Bob selects all books again, he will see “The Odessey” (old title, no spelling fix) and “The Iliad” (the inserted book).

To summarize, the four levels from least isolated to most isolated are: read uncommitted, read committed, repeatable read, and serializable. They define what types of changes made by Tom that Bob will be able to see within a single transaction.

In Practice

Say the books we are selecting are ordered based on an arbitrary position column (they’re on our bookshelf, for instance). Assume read committed isolation level.

Title       | Position
----------------------
The Odyssey | 1
The Iliad   | 2
The Nostoi  | 3

Bob wants to move “The Odyssey” to the bottom position. To do this, he needs to update its position to the bottom of the list (position 4), then subtract 1 from all positions. At the same time, Tom is adding a new book “The Cypria”. Working this through:

  1. Bob checks the bottom position, finds it to be 4
  2. Tom inserts “The Cypria” in the bottom position of 4
  3. Bob updates the position of “The Odyssey” to 4
  4. Bob subtracts 1 from all positions, and since he is using read committed he will “see” and update the newly inserted book.
  5. Both “The Odyssey” and “The Cypria” have a position of 3
Title       | Position
----------------------
The Iliad   | 1
The Nostoi  | 2
The Odyssey | 3
The Cypria  | 3

If Bob had used the serializable level, the list would have remained consistent for his entire transaction, so his update would not have affected “The Cypria” that Tom inserted, and so would not have updated its position from 4 to 3. (In practice the way databases normally handle this is to actually abort one of the transactions with an error.)

For those using Rails, you may have recognized the above scenario as a typical acts_as_list scenario, and you’d be correct. In a default configuration, the acts_as_list plugin makes the same mistake outlined above, and will leave you with inconsistent data. The quickest fix is to wrap all list operations in a serializable transaction.

Book.transaction do
  Book.connection.execute("SET SESSION TRANSACTION ISOLATION LEVEL SERIALIZABLE")
  @book = Book.find_by_name("The Odyssey")
  @book.move_to_bottom
end

(It may have occurred to you that some locking or a unique index on position could avoid the exact scenario above, but that breaks acts_as_list and fails to address some other edge cases left as an exercise for the reader. The main point for the purpose of this article is to understand why it breaks under read committed, but works under serializable.)

As a general rule, read committed is a sensible default. It’s easy to reason about, fast, and forces you to be explicit about your locking strategy. Jump up to serializable when needed, usually when dealing with ranges. MySQL’s repeatable read default can be confusing and deadlock in unintuitive ways, as such it is not recommended.

This has been a very brief introduction to the four standard SQL isolation levels: read uncommitted, read committed, repeatable read, and serializable. Hopefully it has helped you get your head around them. I’ll be going into much more detail with practical hands on exercises in my training days at Engine Yard’s San Francisco office on the 24th and 31st of July. Visit www.dbisyourfriend.com for course and registration details.

Share this post:
  • email
  • Digg
  • del.icio.us
  • Reddit
  • Slashdot
  • StumbleUpon
  • Technorati
  • Twitter
  • Google Bookmarks
  • Facebook
  • LinkedIn
Popularity: 17% |
Rate this post: 1 Star2 Stars3 Stars4 Stars5 Stars
Loading ... Loading ...

Engine Yard AppCloud CLI

By Corey Donohoe | July 20th, 2010 at 9:07AM

At Engine Yard we’ve been helping developers ship Ruby applications for almost four years. Our approach to deployment has changed a few times but at its core our focus has always been helping people deploy and scale Ruby on Rails applications on virtualized hardware. Almost two years ago, we started experimenting with Amazon’s AWS service and realized that people wanted more of a self service setup. For the first time, we decided to take a stab at providing the same kind of service on other people’s hardware instead of our own. This has grown into our AppCloud offering. Today, we’re happy to announce an awesome new addition to AppCloud that enables developers to ship code faster, easier, and straight from the command line.

A Bit of EY History

In our early days, we provided our customers with customized capistrano recipes to deploy their ruby applications to our clusters. A problem quickly arose because we also needed to help them maintain this recipe as we helped them scale their applications. We learned that keeping our customers’ capistrano recipes up to date was a truly painful exercise, so when we built AppCloud we went with a more centralized approach.

Early AppCloud Direction

We thought that solving the problem of keeping most of the deployment related information in sync was so painful that we built a web based deployment strategy. It wasn’t the worst idea ever, but the disconnect between leaving your shell and going to a web browser isn’t really what developers want. In addition, we were so excited about the idempotency that chef offered at a configuration level that we felt it was imperative to “verify” the state of the system with a chef run each time we shipped code. This made pushing code slower than necessary and occasionally created panic situations if the chef run failed for some strange reason. People could still use capistrano with AppCloud, but it required them to re-download their deployment recipes every time their environment changed. There also wasn’t an easy way to maintain customizations if customers kept having to re-download the capistrano recipe. Over and over again, we kept hearing the same complaints from customers. Customers liked the provisioning flexibility on AWS but shipping code on AppCloud was suboptimal. A few months ago, we finally admitted that our intentions were correct but we hadn’t been doing the best things for our customers. We started working on a way to help our customers ship code more effectively.

Customer Feedback is Awesome

We accepted that idempotency is extremely important when it comes to system configuration but that doesn’t mean you need to re-run chef each time you ship application code. We realized that people want to see their code running on their servers ASAP. Finally, we embraced the idea that people want to ship code with a command line tool similar to the way most people use rake to run their test suite. We’re happy to introduce a more pleasant way to ship code to AppCloud, the engineyard gem.

A Better Workflow

The old way of deploying with chef works, but it forces you to reconfigure your servers every single time you deploy. The workflow looked like this:

  • Boot some instances (provision, configure, deploy)
  • Ship code (run configuration, deploy code)
  • Ship code (run configuration, deploy code)
  • Ship code (run configuration, deploy code)
  • Tweak system configuration (configure)
  • Ship code (configure, deploy)
  • Ship code (configure, deploy)

With the Engine Yard CLI, you can deploy without verifying your system’s configuration, so it’s quite a bit faster to ship new code.

The new workflow looks like this:

  • Boot some instances (provision, configure)
  • Ship code (deploy)
  • Ship code (deploy)
  • Ship code (deploy)
  • Tweak system configuration (configure)
  • Ship code (deploy)
  • Ship code (deploy)

We really think our customers are going to prefer this approach because, let’s face it, we ship code way more often than we reconfigure systems.

Get Started

  • gem install engineyard
  • cd ~/myapp
  • ey deploy

One of the things we like most about the new CLI is that it shows you, in real time, what’s going on with your deploy. If something goes wrong, you don’t have to scroll through a huge log in your browser; the error messages are right there in your terminal. When it succeeds, the process exits, so you know immediately that it’s done. No more staring at the dashboard waiting for a spinning dot to turn into a green one. How about ey deploy &amp;&amp; mpg123 woohoo.mp3 || mpg123 sad-trombone.mp3? That’s immediate, unmistakable, annoying, audible feedback. You can’t get that from a green dot.

Other Great Features

You can do a lot more than just deploy with the engineyard gem. Check out the docs and the FAQ.

Go forth and ship!
ship it squirrel

Share this post:
  • email
  • Digg
  • del.icio.us
  • Reddit
  • Slashdot
  • StumbleUpon
  • Technorati
  • Twitter
  • Google Bookmarks
  • Facebook
  • LinkedIn
Popularity: 14% |
Rate this post: 1 Star2 Stars3 Stars4 Stars5 Stars
Loading ... Loading ...

Concurrency and the AASM Gem

By Xavier Shay | July 19th, 2010 at 9:07AM

Hello all,

The Engine Yard blog is back in action after taking a break following JRuby 1.5, Rubinius 1.0, the introduction of xCloud, RailsConf and (very soon) Rails 3.

Our latest post is from a special guest and Engine Yard partner Xavier Shay. He’ll be running training sessions on ‘using your database to make your Ruby on Rails applications rock solid’ at Engine Yard’s San Francisco office on the 24th and 31st of July. Visit www.dbisyourfriend.com for course and registration details.

Your Ruby on Rails code is run concurrently, whether you like it or not.

Concurrency is a staple term when talking about hosting infrastructure, but it is too often brushed aside when discussing actual code bases. This attitude is especially prevalent in the Ruby on Rails community: I can’t name one popular plugin that gets it right. In this post I will address problems with the typical state machine pattern used by Rails applications, and show you how to address them and make your code bullet-proof.

The Problem

Consider the following controller action, backing a big green “ship button” next to a purchase order:

def ship
  @order = PurchaseOrder.find(params[:id])
  @order.ship!
  redirect_to order_path(@order)
end

Imagine two users both press the “ship” button at the same time. (Or as often happen, one user double clicks the button.) The two requests will hit the load balancer and be distributed out to run on different processes. What happens when the above code—typical of many rails applications—is run in two different places at the same time?

Both processes will load the order from the database at line 2. At line 3 when the ship! method is run, both processes will check the attributes of the order and see that it is currently unshipped. As a result, both execute shipping code, which may include sending emails, updating caches, and transferring funds. As a result, the customer will receive duplicate emails, or worse, be charged twice. All versions of acts_as_state_machine (AASM) exhibit this behavior.

The Fix

Any time you read data from the database with the intention of making changes based on that data (“ship the order if it isn’t already shipped”) you must obtain an exclusive database lock on the row (or employ some form of optimistic locking strategy when updating, a topic not covered in this post). The database will block any processes trying to access that row until the session that obtained the lock concludes its transaction (COMMIT or ROLLBACK). ActiveRecord allows us to do this using the :lock flag:

def ship
  PurchaseOrder.transaction do
    @order = PurchaseOrder.find(params[:id], :lock => true)
    @order.ship!
  end
  redirect_to order_path(@order)
end

Working through the above example again, the first process to execute the find will issue the following SQL:

SELECT * FROM purchase_orders WHERE id = 1 FOR UPDATE

Notice the “FOR UPDATE” on the end; this instructs the database to place an exclusive lock on the row. When the second process executes the find and submits the above SQL to the database, the database will wait for the first transaction to complete (after calling ship! and updating the state of the order) before reading and returning the row. The returned row will now have a state of “shipped”, and as such the ship! method will effectively be a noop (no operation). The customer will only receive one email.

It is also possible using ActiveRecord to lock an object that has been already loaded from the database:

def ship
  @order = PurchaseOrder.find(params[:id])
  PurchaseOrder.transaction do
    @order.lock!
    @order.ship!
  end
  redirect_to order_path(@order)
end

This is equivalent to a reload, but adds the “FOR UPDATE” suffix necessary for a database lock. It is an extra SQL statement (the order is selected twice), but is an easier pattern to abstract away.

class Order < ActiveRecord::Base
  # This method is usually provided by AASM
  def ship!
    return if shipped?
  # Important emails and computations
  end
  def ship_with_lock!
    transaction do
      lock!
      ship_without_lock!
    end
  end
  alias_method_chain :ship!, :lock
end

With alias_method_chain, we can continue to use exactly the same controller code we started with (just a plain call to ship!), and locking is handled for us in the background.

Lost updates or duplicate execution won’t be a problem for every website, but if you are starting to worry about the concurrency of your hosting infrastructure, it’s worth having a look over your code too.

If you’d like to join me for some hands-on work with this, I’ll be running classes at Engine Yard’s San Francisco office on the 24th and 31st of July. Visit www.dbisyourfriend.com for course and registration details.

Share this post:
  • email
  • Digg
  • del.icio.us
  • Reddit
  • Slashdot
  • StumbleUpon
  • Technorati
  • Twitter
  • Google Bookmarks
  • Facebook
  • LinkedIn
Popularity: 15% |
Rate this post: 1 Star2 Stars3 Stars4 Stars5 Stars
Loading ... Loading ...