It should be no surprise to regular readers of this blog that I am in love with UUIDs. As I have said before, they are an inexhaustible source of identifiers that are meaningless (not a pejorative term!) and whose generation can be distributed/federated without danger of duplication. As a result, they are an extremely powerful means of providing a uniform and federated identity scheme.
As a SOA-indoctrinated IT practitioner, I am loathe to expose domain-specific entity identifiers to the larger world because such leakage leads to tight coupling and brittle integration. Yet identifiers of "resources" must often be exposed. How do we do this? Even "best practice" guidelines fail to adequately admonish designers against exposing domain-specific identifiers. [e.g., Subbu Allamaraju's otherwise excellent
RESTful Web Services Cookbook talks about "opaque" URIs in recipe 4.2 but ends up recommending the use of internal entity identifiers in recipe 3.10 :-(.]
My point is simple. If the 'employee' table in my application's database looks like this
+------+--------------+---------------+-------------+
| id | first_name | last_name | dob |
+------+--------------+---------------+-------------+
| 1122 | John | Doe | 12-Jan-1960 |
+------+--------------+---------------+-------------+
| 3476 | Jane | Doe | 08-Sep-1964 |
+------+--------------+---------------+-------------+
| 6529 | Joe | Bloggs | 15-Jun-1970 |
+------+--------------+---------------+-------------+
I do not want to be exposing resources that look like these
/employees/1122
/employees/3476
/employees/6529
I don't want to expose my application's local primary keys to the entire world. They may be "meaningless" but they're still coupled to my domain's internal data structures. I need something else.
My standard solution so far has been the magnificent UUID. I add a candidate key column to my table, like so
+------+--------------+---------------+-------------+--------------------------------------+
| id | first_name | last_name | dob | UUID |
+------+--------------+---------------+-------------+--------------------------------------+
| 1122 | John | Doe | 12-Jan-1960 | 4885c205-8248-4e5b-9c45-4d042e7cc992 |
+------+--------------+---------------+-------------+--------------------------------------+
| 3476 | Jane | Doe | 08-Sep-1964 | cdbf87dd-93cb-4c53-9c5d-718c596b0a00 |
+------+--------------+---------------+-------------+--------------------------------------+
| 6529 | Joe | Bloggs | 15-Jun-1970 | 73feb1bf-e687-4d58-9750-5bf98ca7b9fa |
+------+--------------+---------------+-------------+--------------------------------------+
or I maintain a separate mapping table, like so
+------+--------------------------------------+
| id | UUID |
+------+--------------------------------------+
| 1122 | 4885c205-8248-4e5b-9c45-4d042e7cc992 |
+------+--------------------------------------+
| 3476 | cdbf87dd-93cb-4c53-9c5d-718c596b0a00 |
+------+--------------------------------------+
| 6529 | 73feb1bf-e687-4d58-9750-5bf98ca7b9fa |
+------+--------------------------------------+
and I expose my resources like so
/employees/4885c205-8248-4e5b-9c45-4d042e7cc992
/employees/cdbf87dd-93cb-4c53-9c5d-718c596b0a00
/employees/73feb1bf-e687-4d58-9750-5bf98ca7b9fa
I've still got unique identifiers, they're guaranteed not to conflict with anything else in time and space, and more importantly,
my domain-specific identifiers remain hidden. I can now even change my entire domain model,
including identifiers, and still preserve my external contracts. That's SOA.
But the sad fact of the matter is that many legacy systems and packaged software do not readily support UUID datatypes or even char(36) columns for various reasons. I have recently heard of a far-sighted software vendor that has provided for a "Public ID" field in their database tables for this precise reason, i.e., to allow an externally visible identifier to be specified for their entities. But alas, the column is defined to be varchar(20), much too small to hold a UUID.
It occurred to me that there is nothing sacrosanct about a 128-bit UUID (expressed as a 36-character string). It's just that the larger a random number gets, the more remote the probability of conflict with another such random number. 128 bits is a nice, safe length. But smaller lengths also have the same property, only with a lower degree of confidence.
The constraints of vendor packages like the one I described above led me to postulate the concept of the LUID (Locally Unique ID). This is just a string that is smaller than 32 hex digits (a UUID has 32 hex digits and 4 hyphens in-between). I call this a Locally Unique ID because the smaller it gets, the lower the confidence with which we can claim it to be universally unique. But we may still be able to rely on its uniqueness within a local system. If I'm only holding details of a few thousand employees (or even a few million customers) in my database, an LUID may still be expected to provide unique identifiers with a reasonable degree of confidence.
That vendor package definitely cannot hold a UUID such as "2607881a-fec1-4e5d-a7fc-f87527c93e2d" in its "Public ID" field, but a 20-character substring such as "4e5da7fcf87527c93e2d" is definitely possible.
Accordingly, I've written a Java class called LUID with static methods
String randomLUID( int _length ) and
String getLUID( String _uuidString, int _length )
The first generates a random hex string of the desired length (without hyphens). The second chops a regular UUID down to a hex string of the required length, again without hyphens.
You can download the class from
here. Just running the compiled class with "java LUID" will result in some test output which should illustrate how it works. Feel free to incorporate it into your own projects, but be warned that there is no warranty ;-).
Of course, there is a limit to how small an LUID can become before it loses its utility, but I'm not going to draw an arbitrary line in the sand over this. The class above represents mechanism, not policy. Application designers need to think about what makes sense in their context. An LUID of the appropriate length can enable them to implement SOA Best Practice by decoupling externally visible resource identifiers from internal entity identifiers (another example of the difference between Pat Helland's
Data on the Outside and Data on the Inside) when a standard UUID cannot be used.
Bottomline: If you can use a standard UUID, do so. If you can't, consider using an LUID of the kind I've described. But always hide the specifics of your application domain (which include entity identifiers) when exposing interfaces to the outside. That's non-negotiable if you want to be SOA-compliant.
Update 27/06/2010:I should perhaps make it clear what my proposal is really about, because judging from a reader comment, I think I may have created the impression that all I want is for entity identifiers to be "meaningless" in order to be opaque. That's actually not what I mean.
To be blunt, I believe that any entity/resource that is to be uniquely identified
from the outside needs _two_ identifiers (i.e., two candidate keys). One of them is the "natural" primary key of the entity within the domain application. The other is a new and independent identifier that is required to support the exposure of this entity through a service interface (whether SOAP or REST). There should be no way to derive this new key from the old one. The two keys should be independently unique, and the only way to derive one from the other should be through column-to-column mapping, either within the same table or through a separate mapping table as I showed above.
To repeat what I wrote in the comments section in reply to this reader:
There was a content management system that generated (meaningless) IDs for all documents stored in it, and returned the document ID to a client application that requested storage, as part of a URI. At one stage, it became necessary to move some of the documents (grouped by a certain logical category) to another instance of the CMS, and all the document IDs obviously changed when reloaded onto the other instance. The client application unfortunately had references to the old IDs. Even if we had managed to switch the host name through some smart content-based routing (there was enough metadata to know which set of documents was being referred to), the actual document ids were all mixed up.
If we had instead maintained two sets of IDs and _mapped_ the automatically generated internal ID of each document to a special externally-visible ID and returned the latter to the client, we could have simply changed the mapping table when moving documents to the new CMS instance and the clients would have been insulated from the change. As it turned out, the operations team had to sweat a lot to update references on the calling system's databases also.
I hope it's clear now.
1. Entity only seen within the domain => single primary key is sufficient
2. Entity visible outside the domain => two candidate keys are required, as well as a mapping (not an automated translation) between the two.
3. UUID feasible for the new candidate key => use a UUID
4. UUID not possible for some reason => use a Locally Unique ID or LUID of appropriate length (code included)