Wednesday, April 01, 2009

The GUIDness of UUID

Microsoft calls it the GUID. The rest of the world calls it the UUID. But regardless of its name, the Universally Unique ID or Globally Unique ID is truly unique.

On the face of it, what's so great about a 128-bit number apart from its length?

Well, plenty. The length, humble as it may seem, crosses a tipping point of sorts, and imparts to the UUID a number of extremely useful properties.


That's what 2 to the power 128 is.

What's the probability of two randomly generated 128 bit numbers being the same?

2 to the power -128, or 0.0000000000000000000000000000000000000029.

That's 38 zeroes before the first significant digit. In contrast, the much-vaunted "five nines" reliability of the very best computer systems corresponds to 0.00001.

Now think about it. In the past, if we ever wanted to ensure that two entities were given IDs that had to be unique, the only way to ensure that uniqueness was through a sequence number generator, and one that was maintained at a single point.

Suddenly, the UUID provides a new way. We can randomly generate IDs of this length from any number of independent sources., and they're virtually guaranteed never to conflict! Of course, our pseudorandom number generator had better be cryptographically secure, or all bets are off...

Many languages provide libraries to generate random UUIDs. Java has the utility class java.util.UUID.

One generates a UUID in Java through the static method call UUID.randomUUID(). A toString() will generate a string representation of the UUID, which has 36 characters (32 hex characters plus 4 hyphens at defined places).

E.g., cd6b31ee-a877-41fe-a8f1-87d35d2045a6

This is more compact than the numeric representation and possibly more portable between systems.

There are any number of possible uses for UUIDs, especially when combined with other ingenious innovations like 2D barcodes. Complete identities and properties can be output in barcode format and affixed to real-world entities, providing a ready link to corresponding data in databases.

I've known of the existence of UUIDs for a while, but I'm only now waking up to their potential.

Update 10/09/2009: Here's the regular expression for a UUID:


Julien said...

In related news, the number of atoms in the Universe is believed to be around 10^80.

So if we want a big database of every single atom, we need a larger UUID, something around 266 bits long according to my calculations.

But then even if we can store one row of the database in the state of one atom on the hard drive, the whole Universe will be just enough to build the database. And there won't be space for replication or caching.

There may not even be enough atoms left for the DBA :-)

Ganesh Prasad said...

> But then even if we can store one row of the database in the state of one atom on the hard drive, the whole Universe will be just enough to build the database.

Just big enough from a conceptual point of view, but building a database has its own overheads (system catalogs, etc.) So it can't be done at all.

That's a bit like Godel's Incompleteness Theorem, isn't it? The Universe isn't big enough to model itself.