• Sales: (866) 518-YARD

MongoDB: A Light in the Darkness! (Key Value Stores Part 5)

By Kirk Haines | September 24th, 2009 at 11:09AM

The universe was dark and chaotic. Bits of broken matter swirled everywhere, illuminated by flashes of explosive light, and the rare gleam of something brighter and more persistent. Those bright lights of persistence always seemed to be shrouded in a miasma of cosmic dust. Then it happened!

A twist of gravimetric interplay pulled two of these lights towards each other, where they swirled and danced for a time prior to crashing into each other. That cosmic convergence showered the surrounding space with illumination as the resulting maelstrom of persistence coalesced towards stability and slashed through the miasma, shining a new light on the cosmos. That new light of persistence was good, and was called MongoDB.

MongoDB can be thought of as the goodness that erupts when a traditional key-value store collides with a relational database management system, mixing their essences into something that’s not quite either, but rather something novel and fascinating.

MongoDB is a document-oriented database. If you haven’t used one before, that may sound strange, but it’s really pretty simple. A document is a set of keys and values that, together, represent a larger set of data. Conceptually, it’s a lot like a table with a free form schema. If you have used Tokyo Cabinet tables, they are functionally similar. It’s a very useful paradigm because it allows you to store and then access your data in a simple, direct, and flexible way.

Installing MongoDB is simple. Just hit http://www.mongodb.org/display/DOCS/Downloads, and download the appropriate package for your platform. Then:

    mkdir -p /data/db
    tar -xvzf PACKAGE
 
    ./mongodb-xxxxxxx/bin/mongod &

At that point, you have a running instance of MongoDB. Now, try a simple interaction with it:

    ./mongodb-xxxxxxx/bin/mongo
 
     > db.foo.save( { a : 1 } )
     > db.foo.findOne()

Awesome! You’re off to the races.

MongoDB support is available in many languages, making it a good choice for a system that has to work in a polyglot environment; all of the major languages have support. The Ruby package is a gem known as mongodb-mongo. To install it, first make sure rubygems knows that gems.github.com is a valid source for gems: gem source --list

Add gems.github.com if it isn’t shown in that list: gem source --add http://gems.github.com

Then install: gem install mongodb-mongo

Or, if you want to install the version that uses a C extension for better performance: gem install mongodb-mongo_ext

Using MongoDB is Simple

>> require 'rubygems'; require 'mongo'
=> true
>> include XGen::Mongo::Driver
=> Object
>> db = Mongo.new.db('finance')
=> #<xgen::mongo::driver::db:0x2a98da7038 @socket=#<TCPSocket:0x2a98da5be8>, @port=27017, 
   @auto_reconnect=nil, @semaphore=#<object:0x2a98da6ed0 @mu_waiting=[], @mu_locked=false>, @name="finance", 
   @nodes=[["localhost", 27017]], @host="localhost", 
   @strict=nil, @pk_factory=nil, @slave_ok=nil>
 
>> collection = db.collection('stocks')
=> #<xgen::mongo::driver::collection:0x2a98d94208 @name="stocks", @hint=nil, 
   @db=#<XGen::Mongo::Driver::DB:0x2a98da7038 
   @socket=#<TCPSocket:0x2a98da5be8>, @port=27017, 
   @auto_reconnect=nil, @semaphore=#<object:0x2a98da6ed0 @mu_waiting=[], @mu_locked=false>, @name="finance", 
   @nodes=[["localhost", 27017]], @host="localhost", 
   @strict=nil, @pk_factory=nil, @slave_ok=nil>>
 
>> stock = {'ticker' => 'GOOG',
>> 'Google Inc.',
>> '38259P508',
>> 'http://www.google.com/finance?q=goog'}
=> {"reference"=>"http://www.google.com/finance?q=goog", 
    "name"=>"Google Inc.", "cusip"=>"38259P508", "ticker"=>"GOOG"}
 
>> collection.insert stock
=> {"reference"=>"http://www.google.com/finance?q=goog", 
    "name"=>"Google Inc.", "cusip"=>"38259P508", "ticker"=>"GOOG"}</object:0x2a98da6ed0></xgen::mongo::driver::collection:0x2a98d94208></object:0x2a98da6ed0></xgen::mongo::driver::db:0x2a98da7038>

That’s all there is to it. Just insert your hash representation of your document, and it’ll be stored for you. To retrieve one or more documents, use the #find method:

>> cursor = collection.find('ticker' => 'GOOG')
=> #<xgen::mongo::driver::cursor:0x2a98d28940 @closed=false, 
   @query=#<XGen::Mongo::Driver::Query:0x2a98d28af8 
   @order_by=nil, @fields=nil, @number_to_return=0, 
   @selector={"ticker"=>"GOOG"}, @hint=nil, 
   @number_to_skip=0, @explain=nil>, @rows=nil, @cache=[], 
   @query_run=false, @num_to_return=0, @can_call_to_a=true, 
   @db=#<xgen::mongo::driver::db:0x2a98da7038 @socket=#<TCPSocket:0x2a98da5be8>, @port=27017, 
   @auto_reconnect=nil, @semaphore=#<object:0x2a98da6ed0 @mu_waiting=[], @mu_locked=false>, @name="finance", 
   @nodes=[["localhost", 27017]], @host="localhost", 
   @strict=nil, @pk_factory=nil, @slave_ok=nil>, 
   @collection=#<xgen::mongo::driver::collection:0x2a98d94208 @name="stocks", @hint=nil, 
   @db=#<XGen::Mongo::Driver::DB:0x2a98da7038 
   @socket=#<TCPSocket:0x2a98da5be8>, @port=27017, 
   @auto_reconnect=nil, @semaphore=#<object:0x2a98da6ed0 @mu_waiting=[], @mu_locked=false>, @name="finance", 
   @nodes=[["localhost", 27017]], @host="localhost", 
   @strict=nil, @pk_factory=nil, @slave_ok=nil>>>
 
>> cursor.next_object.inspect
=> "{\"_id\"=>#<xgen::mongo::driver::objectid:0x2a98ce9448 @data=[74, 184, 252, 71, 34, 116, 195, 23, 83, 115, 44, 164]>, 
   \"reference\"=>\"http://www.google.com/finance?q=goog\", 
   \"name\"=>\"Google Inc.\", \"cusip\"=>\"38259P508\", 
   \"ticker\"=>\"GOOG\"}"</xgen::mongo::driver::objectid:0x2a98ce9448></object:0x2a98da6ed0></xgen::mongo::driver::collection:0x2a98d94208></object:0x2a98da6ed0></xgen::mongo::driver::db:0x2a98da7038></xgen::mongo::driver::cursor:0x2a98d28940>

As you can see in the example above, #find is simple. It takes a hash which describes keys to search, and the values in them to search for. It returns a cursor object that can be enumerated in order to retrieve the return results. So in a case where you have many records that were returned as the result of a find operation, you could do something like this:

collection.find('price_date' => '2009-09-21').each do |stock|
  # do stuff with stock
end

If your query should only return a single data item, or you only care about the first of a set of data that might match, you can use #find_first, like this:

>> collection.find_first('ticker' => 'GOOG')
=> {"_id"=>#<xgen::mongo::driver::objectid:0x2a98ccd090 @data=[74, 184, 252, 71, 34, 116, 195, 23, 83, 115, 44, 164]>, 
   "reference"=>"http://www.google.com/finance?q=goog", 
   "name"=>"Google Inc.", "cusip"=>"38259P508", "ticker"=>"GOOG"}</xgen::mongo::driver::objectid:0x2a98ccd090>

Notice in the above set of returned data that there is one additional field that is added to the record. MongoDB reserves all fields that start with the _ character for internal use. The _id field is a unique identifier for that row of data. It receives special indexing and treatment by MongoDB in order to make many db operations more efficient.

So, if you’re like me, you’re looking at these examples and wondering how you move beyond find_first(FIELD => VALUE), which is obviously limited to searching only for exact matches. MongoDB has you covered:

  • Boolean searches: collection.find({'price' => {'$gt' => 10.00}})
  • Regular expressions: collection.find({'ticker' => /^MS/})
  • Sets: collection.find({'ticker' => {'$in' => ['GOOG','YHOO']}})
  • Sorting and liming: collection.find({'cusip' => {'$gt' = '580'}}, {:limit => '100', :sort => 'ticker'})

In this way, MongoDB provides much of the query capability of a SQL database.

While you can query the document store on any key, if there are keys that you expect to be doing a lot of queries with, you should create an index on that key. Doing so dramatically increases the speed at which the data can be queried, especially when there is a lot of it. To do so:

>> collection.create_index('key')
=> "key_1"

In addition to its key-value-like storage capabilities, MongoDB has one other interesting capability that I want to reveal. It offers a GridFS storage system that lets people store complete files within the database. The Ruby library for Mongo that provides access to this capability is called mongo/gridfs. It essentially permits you to do file IO into and out of a MongoDB database.

>> require 'rubygems'; require 'mongo'; require 'mongo/gridfs'
=> true
>>  include XGen::Mongo::Driver
=> Object
>> include XGen::Mongo::GridFS
=> Object
>> db = Mongo.new.db('finance')
=> #<xgen::mongo::driver::db:0x2a98d41800 @auto_reconnect=nil, @host="localhost", 
   @semaphore=#<Object:0x2a98d416c0 @mu_waiting=[], 
   @mu_locked=false>, @name="finance", 
   @nodes=[["localhost", 27017]], @strict=nil, 
   @pk_factory=nil, @slave_ok=nil, 
   @socket=#<tcpsocket:0x2a98d40928>, @port=27017>
 
>> GridStore.open(db,'testfile','w+') {|fh| fh.puts "This is a test."}
=> nil
 
>> GridStore.open(db,'testfile','r') {|fh| puts fh.read}
This is a test.
=> nil</tcpsocket:0x2a98d40928></xgen::mongo::driver::db:0x2a98d41800>

As you can see, MongoDB is very easy to use. It is not a screaming speed demon like a simple key-value store (such as a Redis or Tokyo Cabinet), but it performs more than adequately. On commodity Linux hardware, tests showed about 2,500 simple document insertions per second, and about 2,800 reads per second using the gem without the C extension.

MongoDB does not have any sharding capabilites that are at production quality, but there is now alpha level support for automatic sharding, so it’s only a matter of time before MongoDB enters the realm of being a fully production ready, horizontally scalable key-value document store.

MongoDB’s charm is that it mixes a very powerful, expansive query model with a free-form key-value-like data store, while still giving adequate performance. It is ideal for storing documents in a database. Query syntax isn’t the prettiest thing around, but with an ease of use that rivals that of Redis, MongoDB should be a strong contender if you have complex data storage needs.

Share this post:
  • email
  • Digg
  • del.icio.us
  • Reddit
  • Slashdot
  • StumbleUpon
  • Technorati
  • Twitter
  • Google Bookmarks
  • Facebook
  • LinkedIn
Popularity: 19% |
Rate this post: 1 Star2 Stars3 Stars4 Stars5 Stars
Loading ... Loading ...

This website uses IntenseDebate comments, but they are not currently loaded because either your browser doesn't support JavaScript, or they didn't load fast enough.

17 Responses to “MongoDB: A Light in the Darkness! (Key Value Stores Part 5)”

  1. A nice wrapper to MongoDB is John Nunemaker's MongoMapper:

    http://railstips.org/2009/6/27/mongomapper-the-ra...

    We've been using it for the past month or so for a new project. MongoDB really shines because it naturally fits in with resource-oriented architectures via an object-oriented language.

  2. dm_10gen dm_10gen says:

    nice article. the limiter on the insert and read speed *might* be the ruby cilent's speed — i think that is something that can be optimized further. would be curious what mongod process CPU is during testing…

    • Mike Dirolf Mike Dirolf says:

      I'm sure it's the client speed. Would be interesting to see results with the C extension installed, but even so there is some optimization that can still be done. Expect some improvements in the next couple of weeks…

    • Kirk Haines Kirk Haines says:

      Yeah. I probably should have elaborated on that more. I'm sure that using the driver with the C extension would net better perf, and while I haven't gone into it in depth, I would bet that there are other perf gains to be found in the Ruby code.

    • ehsanul ehsanul says:

      Yeah, MongoDB has a list of benchmarks done with various drivers. There are absolutely huge differences between the different drivers in different languages. The page is here: http://www.mongodb.org/display/DOCS/Performance+T...

  3. Mike Dirolf Mike Dirolf says:

    Great article by the way! Especially enjoyed those opening paragraphs…

  4. It seems like yesterday I had Ezra helping me out on a Saturday to get MongoDB working on an EY Solo instance! (Turns out that was back in May.)

    We have been using Mongo in a couple of different Twitter apps since April and have thoroughly enjoyed it.

    Excellent post!

  5. Bill Doughty Bill Doughty says:

    I tested MongoDB for a project I'm working on that requires a document storage facility to store large text files and some associated attributes. GridFS seemed like an ideal solution.

    However, I found it to be terribly slow. My tests uncovered that the Ruby GridFS driver does all the work of chopping data between the GridChunks. Furthermore, it does this one byte at a time. There is about an 8 function call stack to write every single character – in Ruby code. Thta gets pretty slow for large files.

    There appears to be much room for optimization in the Ruby GridFS driver. I may tackle it myself when I get some time. For now, I opted to go with storing the actual files on the native filesystem and storing the attributes and a pointer to the file in MySQL.

    I read that some people are using GridFS in production for large media files. I wonder if anyone else has experienced this slowness, or how many people are actually using the Ruby driver for MongoDB's GridFS feature. It is a very nice concept.

    • mdirolf mdirolf says:

      There is definitely room for performance improvements to the ruby GridFS client. We'll be working on the performance of the Ruby client in general over the next couple of weeks, but any contributions are more than welcome ;)

    • akdubya akdubya says:

      It's not just you. GridFS has been quite slow for me as well even after fiddling with the chunk size. Here's a relevant gist: http://gist.github.com/159487. I agree that the driver is doing too much data shuffling in pure ruby.

      For now I'm using Tokyo Tyrant for distributed blob storage but I'd love to be able to simplify my config.

      • Nuanda Nuanda says:

        Around a month ago I had the same problem with GridFS performance in mongodb Ruby driver. I hacked away a small "extension" of mine which I use now for faster writes and reads to/from the GridFS. Here is a gist link:

        http://gist.github.com/214124

        (I apologize for not being able to switch Ruby highlighting on and any other thing which went wrong with this gist – this is my first gist :)).

        The idea is to add another set of fast_read, fast_write methods so one may use them instead – the methods use "entire-chunk-at-once" approach. It worked for me so far (but not in any kind of production installation, just in a private project), so I though I post it here, maybe it will be useful for you or someone else.

  6. DavidM DavidM says:

    We moved our large (hundreds of GBs) data storage backend from MySQL to MongoDB a few months back may be of interest – http://blog.boxedice.com/2009/07/25/choosing-a-no...

  7. Jason Jason says:

    Just wondering, why would I want to use this over Tokyo DB? From what I understand Tokyo DB is faster and more flexible, but I don't really know.

    • Kirk Haines Kirk Haines says:

      For an apples to apples comparison, you would need to compare Tokyo's table support to Mongo. I think when you do that, and especially if both have more than a tiny amount of data in them, Tokyo won't have a clear speed advantage. I haven't personally compared them in that way, but that's my hunch.

    • Jim Van Fleet Jim Van Fleet says:

      Here's something that was quite important for our use case that Tokyo* did not support.

      http://groups.google.com/group/rufus-ruby/browse_...

      In general, TokyoCabinet is more of a competitor for the key-value stores rather than the document-database model that CouchDB and MongoDB represent.

  8. Pedro Marban Pedro Marban says:

    Version 0.18.2 2009-12-29
    Significant GridStore performance improvement (thx., Sunny Hirai)
    No more one byte at a time writes/reads!