Tuesday, April 07, 2009

Death of the Directory

The LDAP directory is a strange beast. It's a data store, but it's hierarchical in nature, rather like the old hierarchical databases most of us only read about in database textbooks. Hierarchical models, as we know, have inferior expressive power when compared to network and relational models. So shouldn't LDAP directories suffer from similar limitations?

It turns out that they do have similar limitations, and the industry as a whole has been going along with the convenient illusion that the emperor has clothes.

Talking to some old industry hands, I became gradually convinced of something that I've suspected for a while - that the LDAP directory was an invention that solved a temporary problem. The problem has gone away, but the awkward solution remains with us.

Twenty years ago, relational databases were relatively slow and cumbersome. Even reads were slow. Relational databases were therefore not a good fit for read-mostly use cases.

Enter the directory server. Directory servers were designed to be very fast at lookups. They were abysmally slow at updates, but that didn't matter. They were designed mainly for lookups, not for updates. Systems requiring on-line transaction processing needed relational databases.

Somewhere along the way, relational databases got faster. Much faster. Faster at reads as well. And somewhere there, LDAP directories lost their raison d'etre. Only the world never noticed.

Today, LDAP directories do some things well - a very small number of things. They perform authentication very well. Checking a user's credentials (an encrypted password, for example) is a directory server's bread-and-butter. Plus it can enforce password policies, lock accounts on a certain number of failed attempts, etc. These are all built-in features, whereas relational databases are too general-purpose to be able to do these out of the box.

But relational databases are excellent at modelling complex relationships. Directories suck at this. If we try and shoehorn complex structures into a hierarchical model, we will end up with highly unsatisfactory results. Not only will it be inelegant and difficult to understand, the resulting directory bloat will also slow down the system, negating the performance advantage that prompted the choice of the directory in the first place.

The bottomline is, anyone looking to implement a "directory" today would be better off with a hybrid model. They should use an LDAP directory for the limited tasks which it is good at (authentication), and should use a relational database to model and store all other information. They then gain the best of both worlds. The directory server will hold just a handful of attributes needed for authentication.

Hybrids are all very fine, but how do we bridge between a relational database and an LDAP directory? In other words, what do we use to link a record in a database with the corresponding entry in the directory? One candidate would suggest itself to anyone who has been following this blog lately - the Universally Unique ID (UUID).

Of course, most relational databases don't support a native UUID datatype. And many DBAs are used to having identity columns for tables, which are autogenerated by the DBMS. In such cases, the UUID could be another attribute of the same table with a unique constraint defined on it. In other words, the UUID is another candidate key, and this is the one that is used to link an entity within the database to corresponding attributes in other systems. It is truly universal and unique.

How do we provision databases and directories so that their records are consistent? There may be no theoretically satisfactory answer (such as a two-phase commit transaction across both), but there are a number of real-world solutions that are adequate. Organisations may find that the hybrid data store is a pragmatic approach after all.

1 comment:

Dave said...

Hi Ganesh,

I've been following your last few posts on the topic of UUIDs. Reading this post reminded me of an interesting article I read concerning CouchDB on the IBM developerworks site. It appears that Damian Katz & Co. have latched on the the UUID concept and put it to good use in abolishing PKs. An interesting read.

Cheers,

Dave