Key-value stores have emerged in the last few years as an alternative to relational database management systems for some data storage tasks. SimpleDB, BigTable, and Tokyo are just three of the variants. So what are they? How are they different than relational databases? And when should you use them?
Applications, whether web apps, simple dynamic websites or command line apps, frequently need some sort of persistent data store. As a result, databases have become ubiquitous on modern systems, and because of this chicken and egg relationship, programmers will often habitually reach for a relational database when the project only calls for a way to persist data.
Generally, this is fine. Relational databases are a well known technology, with numerous features that make life easy for the people who use them. And while you can quibble about the nuances, relational databases typically do a decent job with most data storage and data query tasks.
Once in a while it behooves a programmer to reach for a different tool. Perhaps you need something that can query simple data faster than a relational system can. Maybe there are scaling requirements that are difficult to meet with a relational system. Or you want to avoid tying an application to a requirement for a relational system that may well have its own maintenance needs over time. Or maybe you just want something simpler than a relational database. In all of these cases, a key-value store may be just the tool for you.
In the simplest terms, a key-value store is exactly what the name suggests: it’s a storage system that stores values, indexed by a key. The implementations are diverse, but they’re usually built on either a hash data structure, or a tree data structure, like a b-tree. Hash based key-value stores have fast lookup and update speeds, and require that keys be unique. Lookup in a hash based tree is limited to key-only lookup.
Tree based stores generally allow multiple identical keys, and because tree based structures (like the b-tree) are ordered, they allow you to query individual keys, as well as ranges of keys. The drawback to tree based stores is that those structures are generally slower than simple hashes. For example:
A log aggregation system that stores messages keyed by timestamp, but which uses a hash based store, will be troublesome to query in a useful way, despite the speed of lookup, because you can only lookup specific timestamps. A system that uses a b-tree based store may be marginally slower for individual queries, but by permitting lookup based on a range of timestamps, can easily query the records of interest. This more than offsets the difference in basic query speeds.
Ruby comes with a simple hash based key-value store called PStore. If you’ve never used it, it’s definitely worth learning. For simple data storage, PStore can be very useful.
(more…)