The Wisdom of Ganesh: identity and access management

Sunday, July 08, 2012

Architectural Lessons in Identity from the Higgs Boson

As I read the details of last week's virtual confirmation of the existence of the Higgs boson, I smacked myself on the head with a "D'oh!"

As an architect, I've been preaching against tight coupling for years, and I utterly failed to realise that there could be examples of tight coupling in the way we think about the real world.

What the discovery of the Higgs boson does is confirm what is called the Standard Model of Particle Physics, and one of the fundamental, though unintuitive, aspects of that model is that mass is not an inherent property of a particle. A particle may acquire mass through interaction with the Higgs field, but it's not necessary. That's why it's possible for particles like photons to have no mass.

It's ironical that in the months just prior to this discovery in the world of Physics, I made a modest discovery of my own that nevertheless shook the foundations of my conceptual world. In the realm of identity management, I realised that any given entity has no inherent properties (attributes) at all! All attributes that an entity may be deemed to have are only by association (just as a particle acquires mass independently of its existence), and therefore the only necessary aspect of an entity is a unique and meaning-free identifier to set it apart from other similar entities. Attributes can then be assigned and de-assigned to it at will through the mechanism of the identifier, and gradually a more sophisticated model can be built up. In creating such sophisticated models, it's important to remember that nothing is inherent to an entity, - not first name/last name, not primary email address, not social security number, - nothing! Just a unique identifier that is internal to the domain and kept invisible to the world outside the domain.

I've found this minimalist model of identity to be amazingly powerful and flexible. In addition to having arbitrary groups of attributes associated with an entity's identifier to form its properties, multiple external identifiers can also be independently associated with the (internal) identifier. The usual surrogate for identity, the username, is then just one such external identifier. Much of the conceptual confusion in identity management comes from mistaking external identifiers like username for the entity itself, leading to extremely clumsy and costly implementations. By creating an explicit internal identifier and treating any external identifier as an attribute associated with the internal identifier, such confusion and costly error can be avoided.

Cross-domain entity references can also be managed very cleanly by mapping both domains' internal identifiers to a shared external identifier, so that neither domain is coupled to the other domain's internal identifier for an entity. That's goodness!

Federated identity, such as when using one's FaceBook ID to enter a website, is nothing but treating the FaceBook ID as an external identifier and associating it with the website's internal identifier to establish the entity's identity. So multiple domains can independently maintain models about the same entity, and for most of the time, they can behave as though they are dealing with different entities. When it becomes necessary for the two domains to recognise these entities as the same, they then create a shared identifier that is external to both of them, and they each map this external identifier to their respective internal identifier.

This is a model of federation that does not make the assumption that there is a centre of the universe. No domain is the centre of the universe. Every domain is independent, and when they need to share their model of an entity, they loosely associate their entity references through a shared external identifier. There is now traceability across domains, but no implied control. I've found in practice that this is the only model that works, - technically, logistically and politically.

It's amazing that I should receive indirect confirmation of my identity model from as unlikely a source as Particle Physics, but I guess that's just a reflection of the fundamental nature of all things.

We have nothing inherent to ourselves but an identity. That's a pretty humbling thought that verges on the religious.

Thursday, April 26, 2012

An Optimistic Approach to Identity

I've been working in the Identity Management area for a few years now, and I've seen three different industries up close (banking, insurance and telecom). What I'm struck by in all these industries is that none of them has historically been customer-centric in their business approach. For decades, banks have always looked at their customers through the prism of accounts, insurance companies through policies, and telecom companies through billing accounts and sometimes carriage services (broadband or mobile services). And everywhere, the holy grail is the same - "single view of customer". Identity and Access Management (IAM) is the way these organisations aim to achieve single view of customer as well as other benefits.

However, IAM initiatives at organisations in all these industries have generally floundered. Why?

I believe that IAM is simple but subtle. That's why although it's not hard to design and deliver an IAM system, it's also treacherously easy to get it wrong.

Some of the major reasons why organisations struggle with IAM are these:

1. Rather than bite the bullet and create a top-level data entity called "customer" with its own unique identifier, organisations choose what they consider a cheaper compromise because of a misplaced belief that using a surrogate for customer (i.e., account, policy, billing account) would somehow do the job. Reality check: it doesn't, and it's more expensive in the long run.

2. Even where identifiers are created for customers, these are not carefully designed. The result is that many identifiers that are chosen have business meaning. It's quite funny at one level to see a system designed with a person's email address as their identifier, and where the major business pain point is that it's very hard to handle the situation where a customer changes their email address. (Why are we not surprised?) Quite often in such cases, there is no other way around the problem but to delete and re-create the customer record.

3. Even where organisations avoid the first two mistakes and embark on an IAM initiative to tie customer data across multiple systems to a new, unique and meaning-free customer ID, they run into logistical problems relating to the existing user base. They struggle to "marry" records across systems to the appropriate customer entity because of the sheer volume of data involved, the cost of changing existing systems, the unreliability of matching algorithms and the need to replace engines while the plane is flying, so to speak. The two problems with matching algorithms are false positives (two or more customers being assigned the same identifier) and false negatives (a customer being assigned two or more identifiers).

I have some suggestions that can make life easier.

1. Create a database external to all existing systems that will maintain mappings. [Resist the temptation to migrate customer attributes from other systems to this one. This is just a mapping database, not a customer master. Use Master Data Management (MDM) principles instead to keep data in source systems in sync.]

2. Use a universally unique and meaning-free identifier for customers. Version 4 (random) UUIDs are a great scheme to use.

3. Adopt an optimistic model of "eventual consistency". I.e., generate a new customer UUID corresponding to every system record, in effect assuming (in the case of a bank) that each account belongs to a different customer, then pare them down to reflect known relationships.

a) You can generate UUIDs for a system in an optimistic way because the probability of two UUIDs conflicting is infinitesimally low, even if you have hundreds of millions of customers. You can check for duplicates out of band if you're paranoid.

b) Similarly, you can optimistically generate UUIDs in a federated way (i.e., each system generates its own UUIDs corresponding to its surrogate records). The probability of conflict is so low it's worth doing this and checking for duplicates out of band.

c) You can afford to start with a system with a large number of false negatives (but no false positives) because this corresponds to a siloed organisation with no "single view of customer". False positives are a greater danger, and we avoid that with this scheme.

d) You can use the existing intelligence in your systems (i.e., the knowledge of which records belong to the same customer) to merge customer UUIDs relating to the same physical customer by eliminating all but one of them at random. Since UUIDs are meaningless, it doesn't matter which one you keep and which ones you remove.

Now you're no worse off than you were before in terms of data quality (i.e., your data is just as clean in terms of known relationships). But structurally, you're far better off because you now have a customer data entity for the first time. As your data quality improves with more reliable mappings, the siloes effectively disappear and you get to a "single view of customer" with no more changes to data structures or processes.

In the case of a telecom company, your mapping database will now consist of three parts. The first part will map customer UUIDs to billing accounts. The second part will map customer UUIDs to product holdings (mobile, broadband and other carriage services, media products, etc.) The third part will map customer UUIDs to other customer UUIDs to reflect corporate organisational structures and household relationships. With this model, the many problems that telecom companies currently face will simply melt away.

- We can see all the product holdings of a customer to determine what else to sell them. We can see this at an individual customer level as well as at the level of a household or organisational unit.

- We can sell media products even to customers who haven't purchased an underlying carriage service

- We can group billing accounts independently of product holdings. In a household, the kids use various products but mum or dad alone may pay the bill.

As you can see, this kind of design isn't hard. But it requires conceptual clarity around the nature of Identity. As I said before, IAM is simple but subtle. It isn't hard to design and deliver an IAM system, but it's treacherously easy to get it wrong.