Blog

A Gentle Introduction to Isolation Levels

By | July 21st, 2010 at 10:07AM

Hello all,

Our latest post is from a special guest and Engine Yard partner Xavier Shay. He’ll be running a pair of training sessions on “using your database to make your Ruby on Rails applications rock solid” at Engine Yard’s San Francisco office on the 24th and 31st of July. Visit www.dbisyourfriend.com for course and registration details.

Bob opens a database transaction and selects everything from the books table. Tom comes along and adds a new book, then Bob, in his same transaction, repeats his same query for all the books. Does Bob see the new book that Tom added?

The answer is that you get to choose! It’s important to understand what your choices are (and what choice your preferred database makes for you) so that you can ensure your code executes in a way that you intend.

The SQL standard specifies levels for how “isolated” transactions running at the same time are, all the way from being able to see uncommitted changes (not isolated) to effectively running the transactions in serial (full isolation). Academically there are eight levels of isolation, but for most purposes you only need to worry about the four defined by the standard. MySQL implements all four, PostgreSQL only two. You can specify a global isolation level for your database, but also override it for individual transactions.

The easiest to understand are the extreme levels: no isolation and total isolation. The first of these is known as *read uncommitted*, and it allows Bob to read the new book that Tom is adding _even before Tom has committed his changes_. As you can imagine this level is mostly useless, however it can very occasionally be handy in some reporting situations.

At the other end of the spectrum is full isolation, known in the spec as **serializable**. Bob will never see the new book that Tom is adding until he starts a new transaction. The database Bob sees is consistent—within the one transaction, the same query will always return the same result. At first glance this level seems like a great option but there’s a lot of overhead involved, it drastically reduces the amount of concurrency you can achieve, and for most purposes the serializable level is overkill.

There are two isolation levels in between read uncommitted and serializable, they are *read committed* and *read repeatable*, and this is where it gets interesting. Read committed is the default isolation level in PostgreSQL and Oracle, and is one step up from read uncommitted. It is the most “common sense” level: Bob will not see any changes made by Tom until Tom commits them.

MySQL defaults to *read repeatable*. In this level, Bob will not see any _updates_ Tom commits, but will see any _inserts_. Say in Bob’s first select he sees one book titled “The Odessey”. Tom then fixes the spelling mistake to “The Odyssey”, and also add Homer’s other epic poem “The Iliad”. When Bob selects all books again, he will see “The Odessey” (old title, no spelling fix) and “The Iliad” (the inserted book).

To summarize, the four levels from least isolated to most isolated are: *read uncommitted*, *read committed*, *repeatable read*, and *serializable*. They define what types of changes made by Tom that Bob will be able to see within a single transaction.

In Practice

Say the books we are selecting are ordered based on an arbitrary position column (they’re on our bookshelf, for instance). Assume read committed isolation level.

Title       | Position
----------------------
The Odyssey | 1
The Iliad   | 2
The Nostoi  | 3

Bob wants to move “The Odyssey” to the bottom position. To do this, he needs to update its position to the bottom of the list (position 4), then subtract 1 from all positions. At the same time, Tom is adding a new book “The Cypria”. Working this through:

  • Bob checks the bottom position, finds it to be 4
  • Tom inserts “The Cypria” in the bottom position of 4
  • Bob updates the position of “The Odyssey” to 4
  • Bob subtracts 1 from all positions, and since he is using *read committed* he will “see” and update the newly inserted book.
  • Both “The Odyssey” and “The Cypria” have a position of 3
Title       | Position
----------------------
The Iliad   | 1
The Nostoi  | 2
The Odyssey | 3
The Cypria  | 3

If Bob had used the *serializable* level, the list would have remained consistent for his entire transaction, so his update would not have affected “The Cypria” that Tom inserted, and so would not have updated its position from 4 to 3. (In practice the way databases normally handle this is to actually abort one of the transactions with an error.)

For those using Rails, you may have recognized the above scenario as a typical @acts_as_list@ scenario, and you’d be correct. In a default configuration, the @acts_as_list@ plugin makes the same mistake outlined above, and will leave you with inconsistent data. The quickest fix is to wrap all list operations in a serializable transaction.

Book.transaction do
  Book.connection.execute("SET SESSION TRANSACTION ISOLATION LEVEL SERIALIZABLE")
  @book = Book.find_by_name("The Odyssey")
  @book.move_to_bottom
end

(It may have occurred to you that some locking or a unique index on position could avoid the exact scenario above, but that breaks @acts_as_list@ and fails to address some other edge cases left as an exercise for the reader. The main point for the purpose of this article is to understand why it breaks under read committed, but works under serializable.)

As a general rule, read committed is a sensible default. It’s easy to reason about, fast, and forces you to be explicit about your locking strategy. Jump up to serializable when needed, usually when dealing with ranges. MySQL’s repeatable read default can be confusing and deadlock in unintuitive ways, as such it is not recommended.

This has been a very brief introduction to the four standard SQL isolation levels: read uncommitted, read committed, repeatable read, and serializable. Hopefully it has helped you get your head around them. I’ll be going into much more detail with practical hands on exercises in my training days at Engine Yard’s San Francisco office on the 24th and 31st of July. Visit www.dbisyourfriend.com for course and registration details.

  • http://www.pulscen.ru cryo28

    Nothing new to me (and I suppose to the majority of web developers). However, I like the way Xavier conveys the main idea of isolation levels – very clear and concise. Thanks for the post.

  • http://metaskills.net/ Ken Collins

    A timely great post. I author the SQL Server Adapter and we have #run_with_isolation_level block method that temporarily sets the isolation during the block and resets it back to the original isolation level afterward. I take it this is pointless in your code example because of the "SET SESSION TRANSACTION" which I am assuming lives for the duration of the transaction block?

  • Mtl3927

    Fantastic post. Really helped me out, thanks!

  • Ddd

    An explanation as to why most databases would throw an error in the book insetion example would be helpful.

  • http://xaviershay.com/ Xavier Shay

    They will only throw an error at serializable isolation level, and the “why” is because there is nothing else for them to do – there’s no way for the database to resolve the conflict automatically within the serializable definition so they throw it back to the programmer to fix. In practice, I typically retry transactions that fail with a serializable exception.

  • http://sekuda.com/ Will Bryant

    Unfortunately the summary of mysql’s (actually innodb’s) default behavior is not quite correct:

    “MySQL defaults to *read repeatable*. In this level, Bob will not see any _updates_ Tom commits, but will see any _inserts_.”

    This is not true; in the innodb default repeatable read level, Bob will not see inserts from transactions that started after his snapshot either.

    However, he will see all updates and inserts if he does a locking select!  This can make life quite complicated – you will get different results depending on whether you select FOR UPDATE or not.

    The best way to remember the behavior is that:
    1. As soon as you start doing something in an innodb repeatable read transaction, you will get a consistent read snapshot, so will see no changes from transactions committed afterwards – except that:
    2. If you do anything that takes row locks (UPDATE, DELETE, INSERT, SELECT … FOR UPDATE) then you WILL see the effects of other transactions, punching through repeatable read and making it more like read committed.

    Frankly, although innodb’s repeatable read is very useful if you want to do a set of consistent queries (for reporting or backup), it’s not the best model for most business logic code.  It was only the default because before row-based replication came along, the only way that you could have update statements that used selects replicate consistently was to use repeatable read.

  • Qertoip

    BTW, transaction_isolation gem allows to run a block of code with any transaction isolation level in a database agnostic way: https://github.com/qertoip/transaction_isolation All ANSI levels are supported (serializable, repeatable read, read committed, read uncommitted).