Blog

LDAP Directories: The Forgotten NoSQL

By | December 17th, 2009 at 10:12AM

When most Rails developers encounter LDAP, it’s usually for user authentication. And most of the time, there’s no choice, they’re working under a dictate that requires them to use it. Usually, this means Active Directory, but very occasionally something like OpenLDAP or the Sun Java Systems Directory Server.

It’s hard to imagine now, but there was once great excitement about the potential for LDAP based directory servers to become more than just authentication servers and morph into general purpose datastores.  LDAP directories promised a single, scalable, high performance data store that could be queried for common information across multiple applications. After all, directories had a lot of virtues:

  • Fast Queries: LDAP directories were heavily indexed, so query speeds were truly impressive—reliably 10x what a relational database could manage. (Write speed was much slower for the same reason: lots of indexes to update when a write happened)
  • Replication: LDAP directories were an “eventually consistent” data store long before Dynamo or Cassandra. Multi-master replication allowed a distributed network of directories to accept writes at any node, and then relay these updates around the directory network. The last update in time always won.
  • Partionable: directories were giant tree structures, and branches could be picked up and moved to another server if the directory got too big. There was built-in referential linking from each amputation point to the correct server, and these servers could be easily geographically distributed.
  • Standardized and efficient: coming from a telecom heritage, LDAP was an efficient wire protocol. It was globalized and cross-system. LDAP queries and responses were binary encoded using distinguished encoding rules, using ASN.1 as the data representation syntax.

In addition to these benefits, directories like Netscape Directory Server and Microsoft Active Directory had a seemingly endless list of other features like rich, complex configurable access control rules and permissions; multiple ways to define groups; rich query semantics and more.

And yet, when we look around today, it’s not LDAP directories that have the NoSQL buzz; it’s the far looser and simpler key-value stores like Cassandra, MongoDB and Redis. So where did LDAP fall down, and is there anything to be learned from its (relative) failure? Here is my own take on why LDAP didn’t take over the world, colored by my (brief) tenure as a product manager for Netscape Directory Server.

  1. Telecom protocols FTL: LDAP, in my own humble opinion, was fatally crippled by its telecom parentage. Just reading the first page of the ASN.1 data structure specification could make your eyes bleed. Debugging a badly behaved LDAP client or query was basically a job for experts wielding binary to text crackers. There was a separate format—LDIF—for converting LDAP into human-readable code, but this was a friction point. Compared to ASN.1, JSON (as an example) is severely limited and incomplete, and yet… about 1000x more popular as a result.
  2. Access control that exceeded human brain capacity: LDAP directories provided lots of rope for people who cared about security to firmly and irrevocably tie themselves in knots. Time and again, I’d see customers with five or more layers of access control rules they found to be confounding, with counter-intuitive effects. Better yet, this level of complexity was indecipherable by anyone without drawing five dimensional set diagrams. Sometimes, there are features you shouldn’t put into a product no matter how much people ask you. They know not what they do.
  3. Interesting data wanted to be relational: it was a simple, but sad truth. Data that’s interesting and important enough to be accessed often by your applications, seems to want to be compared and operated on in the context of your other interesting data; that sounds a lot like the right case for a relational database. Directories, as a hierarchical data store, couldn’t easily accommodate the kinds of queries that customers ended up wanting to do, once they were storing enough interesting data. So the solution was to patch in “relationy” features like aliases which soft-linked two values in different parts of the tree—but these were patchwork solutions. In their worst (over-used) incarnation, they turned a directory server into a weird hard-to-maintain mutant hybrid of relational and hierarchical database.

There were other downsides to LDAP directories of course. The learning curve could be steep for LDAP since it was a truly novel technology for most people used to RDBMS’s and SQL. And probably most importantly, most directories weren’t open source, and so they missed the opportunity to fully leverage a community of interested developers and administrators.

Lessons for this Generation of NoSQL (?)

I hesitate to speculate on the lessons from LDAP for this generation of NoSQL stores, since open source has changed the game considerably in the last ten years. That said, I do think LDAP got a lot of things right (fast, distributable, scalable and standardized). It’s arguable whether custom binary protocols (aka MongoDB’s) will really hurt adoption as long as the data structure specifications are reasonably readable, but Couch’s JSON/REST/HTTP combo is certainly a little easier on the eyes.

I do know one thing: keep the access control simple. Your users will thank you later!

  • http://needlesslymessianic.com Jayson Vantuyl

    Access control and the line format were absolutely horrible in LDAP. I truly believe that's what killed it. Otherwise, it was fairly brilliantly designed. To this day, I've never used anything that replicated so smoothly.

    For those that care, X.500 (of which, you only ever see X.509 anymore) had this massive encoding system called ASN.1. It had four different actual encodings that all represented a single schema of abstract data structures. It even had a compiler language for these encodings (if you've ever seen an SNMP MIB, you've seen it). This made it so unwieldy that nobody could implement it (at least not without buying literally $3000 of telecom specs).

    LDAP was a copy of X.500 DAP (which was much heavier), so instead of using the full encoding (called BER), they implemented a stripped-down version (called LBER). Even so, it was so impossible that nobody could implement it very well.

    The final nail in the coffin, which Michael alludes to, was the eventual use of SASL (of Cyrus IMAP Server fame) for authentication. This basically made it so horrible that all programmers exposed to the standards either clawed their eyes out, burst spontaneously into flame, went insane, or some combination thereof.

    The ultimate moral of this story is "Simple is good".

  • http://directory.apache.org Emmanuel Lécharny

    Interestig post. However, I think that blaming ASN.1 for being one of the reason LDAP is not mainstream is wrong, and you also missed two important oher points.

    Why is ASN.1 not a problem ? Just because nobody cares about decoding the protocol. As nobody is trying to decode what is sent on RMI/IIOP when writing some J2EE application. What is important is to be able to send request to an LDAP server and to receive responses. And here, you can blame JNDI as one of the worst possible choice when it comes to do the job…

    Now, there are other LDAP pitfalls :
    - The LDAP schema is really a PITA. Nothing compared to the simplicity of a RDBMS schema, really. Not nly it's complex, but for many reasons, many servers like AD (but is this a LDAP server or a NIS server, all in all ?) does not allow you to modify the schema when you really need to do so. There is also good "chance" that you won't be able to transfer your LDAP schema from one server to another one.

    - LDAP configuration and installation is far from being simple. Just install an OpenLDAP server to see what I mean… Not that OpenLDAP is bad, it's probably one of the best LDAP implementation ever, but there are so many badly documented options that at the end of the day, you prefer to keep the LDAP server away from applications…

    However, I really think that LDAP can evolve a lot in the next few years. and there are now more than one OSS implementation available :
    - Apache Directory Server
    - OpenDS
    - OpenLDAP
    - UnboundID

  • http://www.engineyard.com Michael Mullany

    I'm in full agreement with JNDI, the LDAP schema and config being just three more of the big pitfalls of LDAP. (All the drama about inetorgperson back in the day was a good illustration) But I disagree about ASN.1 – it wasn't important for users or admins, but it did affect the number of people who could debug problems with LDAP clients (and proxies). I'm pretty interested to see what happens with UnboundID — Steve Shoaff was the directory server product manager after me at Netscape.

  • http://directory.apache.org Emmanuel Lécharny

    IMO, the main problem is that the client API doesn't have a decent log system, so you can't analyze what is being exchanged. That's too bad, but it's not ASN.1 fault :) Otherwise, we have written a small tool, a LDAP proxy GUI, for that purpose (http://svn.apache.org/viewvc/directory/sandbox/ol… but sadly never had time to clean it up…

    AFAICT, Unboundid is on rail. That's good to see that there are more and more LDAP open source initiative those days.

  • jeemster

    I do not think LDAP has failed.

    As I work with many large organizations and all of them have LDAP directory stores.

    The LDAP vs Relational Databases has gone on for years and the answer is still the same. THey serve two different purposes.

    Using a LDAP server to keep a history of who went through the card swipe and when is a poor implementation.

    But expecting a card reader to authroize in real time if a person is authroized through a SQL DB with the heavy overhead of the SQL protocol and more so the client, is just as wrong.

    As for access control, to imply that access control to a RDMS is simpler than LDAP is short sided. Often access control within the RDBMS is simply by-passed and the access control is perfromed within each application.

    As for installing LDAP servers vs installing a RDBMS, you really think installing an Active Directory server is harder than installing Oracle?

    And sure, OpenLDAP is difficult to install. In a large part due to it being OpenSource. Installing a commercial LDAP server from Novell or SUN or I am sure other vendors certainly easier than installing Oracle or many of the other commercial RDBMs.

  • John

    Where I work we use LDAP a lot. It's great for a lot of things, and I think this article really hit the nail on the head of what those good things are. But for many things it's a poor fit. Take for instance a large enterprise with 100k employees. These person records will likely be spread over dozens of distinct companies and then even more departments and divisions, etc. Let's say a developer asks a very basic question like, "Can I get a list of company names?" Well the answer is yes, query all 100k records and then parse out the duplicates in code. There's no "select distinct" in LDAP. There are no joins either so you can't really (in a clean way anyhow) have something like a lookup table. You might be able to accomplish this using groups, but it's not meant to work that way and in the end it's a hack.

    Another thing that's annoying is, at least in IBM's LDAP, there's no way to do pagination in the way that we think of pagination typically. For instance, there's no "select * from table limit 1,25" style syntax. That one might be a case specific to IBM, and I think they might even have that covered in a later version, but it's just another example where we've run into trouble with LDAP.

  • Charlie

    I think the right way to use LDAP is as a directory of stuff like people, groups/roles, and network resources like hosts and printers. I've also heard that it's common to have calendar data in there, but I think there are better alternatives (eg CalDAV). It should be thought of as a primarily (and if possible completely) read-only store for applications and hosts, because of the last-read-wins behavior mentioned in this post (no transactions) — unless it's really ok to just lose some updates whenever they collide.

    You can manage all this data in a relational database, and applications can send their updates to that relational database. Then those updates can be pushed from your relational database to the "root" of an LDAP directory tree and on down to the leaves. Then reads can be really fast. As was mentioned, the query abilities are limited. The queries on the directory should match the directory structure, more or less.

    Why do this? Because some things are written to work this way. Email clients and address books can usually tie into it. But probably more significantly, all your hosts can use your central LDAP directory in place of local files like /etc/hosts, /etc/passwd and /etc/group. Manage all that stuff from one dashboard. That's what Active Directory (Microsoft) and Open Directory (Apple) do, and there's no reason the open source world can't get a little better about doing the same.

    It would also be very nice if applications, rails apps in particular since this is the engine yard blog, could be made LDAP-aware. This would make a lot of sysadmins much happier since they might then be able to manage all the roles/groups/permissions info across all applications from one place rather than having to do it on a per-application basis.

  • http://nimbusds.com Vladimir Dzhuvinov

    Very good analysis!

    In my opinion LDAP directories can still become an attractive NoSQL store, once fitted with a web friendly JSON front-end. This is what I did two years ago with an internal project, which eventually evolved into Json2Ldap, a software product which is now sold on its own.

    It works well for web and cloud applications, for things like central keeping of user and group data, web app user profiles and app configs.

    As for making the schema more flexible, there is the “extensibleObject” class which allows clients to store arbitrary key/value pairs in an entry.

  • Sagar Sonawane

    Gr8 article..!!

    I would like to invite you to join in discussion, started by me on linked in, inspired from your article. I would really appreciate to put your thoughts on the topic.

    Thanks and Regards,
    Sagar Sonawane

  • Sagar Sonawane

    apologies url for the topic is http://lnkd.in/eQhM9T

  • Sagar Sonawane

    apologies url for the topic is http://lnkd.in/eQhM9T

  • http://www.copperykeenclaws.com/in-praise-of-pre-built-software-components-that-keep-me-from-having-to-learn-things/ In Praise of Pre-built Software Components That Keep Me From Having to Learn Things | Coppery Keen Claws

    [...] inner monologues as I did this research sounded like, “Ah, so ldap is the original nosql. That’s interesting. And it involves issues of identity and security which are inherently [...]