Blog

Pragmatic Polyglot Persistence with Rails

By | August 23rd, 2010 at 9:08AM
This post comes from guest community contributor Kent Fenwick. Kent is the tech co-founder of of Viewpointr, a personalized Q&A service that aims to provide an easy way to get and give help. When he isn’t programming, he spends time with his family and friends in Toronto. Kent writes here and can be followed on Twitter at @kentf.

It’s getting more and more difficult to pick a persistence layer for your web application. When I started in Rails four years ago, there was really only one option, MySQL. Now, there are many more, each with their own pros and cons. Some are new and some are old, some are tested, and others, not so much. What’s clear is that when you are building a business around data, you want to make good decisions. That being said, often only the future will tell if you’ve made the right ones. I want to share with you my persistence story about how I ended up getting the best of both worlds.

h2. The Problem

There are too many choices and each choice has a loud evangelist of its own. When designing Viewpointr I went go back and forth daily between MongoDB, MySQL, PostgreSQL and Cassandra. Viewpointr is essentially Twitter with a focus on helping people. Therefore, we have some common data elements: a user specific time line, a user specific list of people who they are helping, and a user specific list of people helping them. Because I am ambitious, I would find myself asking questions like:

bq. “Hmm… but will MySQL scale to 1,000,000 records?”

Looking back on these internal conversations I find them funny; programmers always tend to think big. However, these are real concerns that developers and teams think about. While planning I would constantly consult the blogosphere for help, and to see what others were doing. Kirk Haines of Engine Yard wrote a great series of NoSQL posts highlighting and comparing different key-value stores and explaining their pros and cons. Since then, there has been a flurry of articles each week outlining different NoSQL datastores, NoSQL vs. MySQL debates and flamewars etc.

h2. The Opportunity

Data is not created equal and this is a good thing. The same way we do not use an array for every “list” type problem when programming, sometimes hashes or linked lists will better suit the needs of the problem. We need to start thinking about data the same way. This was the best decision we made at Viewpointr and it allowed us to move forward at a great pace.

I looked at our application and broke it down into components. Viewpointr has many typical CRUD features similar to all Rails apps. These are very well designed for MySQL and a relational database. Being able to pull a list of answers based on a given question using simple and optimized SQL that I understand is a big win. However, there are some things that it doesn’t model well.

Friendships. The simplest way to model friendship using a relational database is to create a relation that refers to the same table with two different names. Let’s say you have a users table and you want to model Twitter-like friendship where User:1 can befriend User:2 without User:2′s permission. It’s easy enough.

class Friend < ActiveRecord::Base
 
 belongs_to :user
 belongs_to :contact, :class_name => "User", :foreign_key => "contact_id"
 
 # user befriends contact
 def self.befriend(user,contact)
    relationship = find_by_user_id_and_contact_id(user.id,friend.id)
    if relationship.nil?
      transaction do
        Friend.create(:user => user, :contact => contact)
      end
    end
 end
 
end
 
class User < ActiveRecord::Base
 
  has_many :friends, :dependent => :destroy
  has_many :contacts, :through => :friends, :order => "created_at DESC", :dependent => :destroy
 
end

However, I have always felt that it’s clumsy. What I really want to say is:

“Each user has a list of IDs that represent the people that they are friends with.”

Sounds like a de-normalized list right?

h2. The Solution

Enter Redis. Redis is a key-value store similar to memcached but more flexible since lists, sets, ordered sets and strings can all be used as values. Thanks to its simple API, the problem I described is essentially an atomic operation in Redis. Redis has a great “set” implementation and allows you to do all of the things you would imagine a set to do: addition, subtraction, unique insertion, deletion, union, intersection, etc.

The operation will ultimately look like this:

SET = Redis.new
SET.set_add key, value

However, since we are working inside a Rails app, we need to make sure we have the right plumbing setup.

# Create a redis.rb in your initializers folder.
# Create a new Redis database for each of your needs.

In our case, we want to have a dataset that keeps track of a User’s helpers (other users who are helping them) and a list of a User’s friends (other users that the user is helping). Since we are going to be using these Redis objects throughout the codebase, I like to declare them as global variables in the redis.rb initializer file.

HELPERS = Redis.new(:db => 0)
HELPING = Redis.new(:db => 1)

Notice that I pass in the :db key so that we make sure HELPERS and HELPING will hold two different Redis objects. You can use redis-namespace gem if you want, but I find the default syntax from the redis-rb gem works well enough for my purposes.

Now that we have these global Redis objects at our disposal throughout the application, we can start using it in our Friend.befriend method.

class Friend < ActiveRecord::Base
 
 belongs_to :user
 belongs_to :contact, :class_name => "User", :foreign_key => "contact_id"
 
 # user befriends contact
 def self.befriend(user,contact)
    begin
     HELPERS.set_add contact.id, user.id
     HELPING.set_add user.id, contact.id
    rescue
     RedisLogger.info "Redis Exception"
    end
 end
 
end
 
class User < ActiveRecord::Base
 
  has_many :friends, :dependent => :destroy
  has_many :contacts, :through => :friends, :order => "created_at DESC", :dependent => :destroy
 
end

However, this isn’t the best solution right out of the gate. Using a NoSQL datastore has some drawbacks that aren’t apparent in development mode but reveals its ugly face in production. If you are not careful, a simple restart of your Redis server can cause you to loose all your data. Managing your Redis data in production deserves it’s own post, (coming soon) but for now, let’s create a safer solution that you can gradually roll out as you become more comfortable with storing, backing up and using Redis datafiles.

class Friend < ActiveRecord::Base
 
 belongs_to :user
 belongs_to :contact, :class_name => "User", :foreign_key => "contact_id"
 
 # user befriends contact
 def self.befriend(user,contact)
    relationship = find_by_user_id_and_contact_id(user.id,friend.id)
    if relationship.nil?
      transaction do
        Friend.create(:user => user, :contact => contact)
      end
    add_to_denormalized_list(user,contact)
    end
 end
 
  def self.add_to_denormalized_list(user,contact)
    begin
     HELPERS.set_add contact.id, user.id
     HELPING.set_add user.id, contact.id
    rescue e
      RedisLogger.info "Redis Exception"
    end
  end
 
end
 
class User < ActiveRecord::Base
 
  has_many :friends, :dependent => :destroy
  has_many :contacts, :through => :friends, :order => "created_at DESC", :dependent => :destroy
 
end

The strategy is simple, mirror the MySQL data in Redis. By adding a call to add_to_denormalized_list, we mirror the ActiveRecord call using the simple and elegant Redis set syntax discussed above. As you and your team get more practice and become more comfortable using Redis in production, you can start writing more to the denormalized list, eventually moving this part of your application away from ActiveRecord and MySQL to Redis. You could do this manually or you can use James Golick’s recently released gem called Rollout that uses, you guessed it, Redis, to programatically rollout features to users.

Like anything else you code, testing and benchmarking this process in production is crucial to make sure you are saving time and cycles. It might seem like a waste to duplicate your data in Redis, but you are a pragmatic polyglot persistence developer right? You want to explore the NoSQL space while making sure that a little mistake or misunderstanding doesn’t sink your ship. Give something like this a try, it doesn’t get any more pragmatic. When do you try it or come up with something new, let me and everyone else know about it.

Thanks for reading.

  • http://rywalker.com/ Ryan Walker

    Timely article, Kent, based on our discussion today of building a stats backend for our new job board and integrating Redis into that – nothing wrong with a safety net when learning something new.

  • Kent Fenwick

    Sounds good Ryan. Let me know how it goes.

  • jcapote

    A great gem for integrating redis with rails is redis-objects, http://github.com/nateware/redis-objects it provides just enough abstraction to use redis easily (and efficiently) from your models.

  • http://mwilden.blogspot.com/ Mark Wilden

    Frankly, in this case, I think the ActiveRecord solution is just fine. Unless I'm missing something, the interface is the same as with the Redis solution. In fact, the only argument made in this interesting article against the AR solution is the single word: "clumsy." I would need a better rationale than that to introduce a whole new subsystem into my architecture. I'd rather spend the time providing customer value.

  • Kent Fenwick

    Thanks for sharing. Just had a look and seems really cool.

  • Kent Fenwick

    Thanks. Must have been a formatting glitch. Will do.

  • Kent Fenwick

    I see what you mean and struggled with this over and over again.

    Let me explain a bit more.

    Redis fits the way I think about the data more. The things that I envision doing with this kind of data in the future is exactly how Redis stores it. For example, of the first things I did was create a method that preforms an optimized query on my Friendship data and returns a list of unique ids. I use this a lot around the site and can easily cache it to make sure it's fast.

    However, in the future, as the site grows, this operation is going to get more expensive and "clumsy" :) So when playing with Redis and doing some benchmarking, I couldn't believe how easy it was. I was ready to switch over completely. Then I restarted my redis server, didn't have something like rsync setup and lost my data. Then I realized that using Redis for production will require some custom deploy scripts and some DevOps thinking. So I hooked up this dual approach.

    It's not for everyone. If you are a huge SQL fan and hate all of this crap surrounding the NoSQL movement, then I agree, it's probably not right for you. However, if you are looking to dabble, the approach described above works really well.

    Thanks for the feedback.

  • Slevin

    & lt )) it's ok

  • http://www.hp.com/ Computers

    Oh, and when he was seriously created right here why won’t he give us a start certificates?