Thursday, May 16, 2013

50 Data Principles For Loosely-Coupled Identity Management

It's been a while since our eBook on Loosely-Coupled IAM (Identity and Access Management) came out. In it, my co-author Umesh Rajbhandari and I had described a radically simpler and more elegant architecture for a corporate identity management system, an architecture we called LIMA (Lightweight/Low-cost/Loosely-coupled Identity Management Architecture).

Looking at developments since then, it looks like that book isn't going to be my last word on the subject.

IAM has quickly moved from within the confines of a corporate firewall to encompass players over the web. New technology standards have emerged that are in general more lightweight and scalable than anything the corporation has seen before. The "cloud" has infected IAM like everything else, and it appears that IAM in the age of the cloud is a completely different beast.

And yet, some things have remained the same.

I saw this for myself when reviewing the SCIM specification. This is a provisioning API that is meant to work across generic domains, not just "on the cloud". It's part of the OAuth 2.0 family of specifications, and OAuth 2.0 is an excellent, layerable protocol that can be applied as a cross-cutting concern to protect other APIs. SCIM too is OAuth 2.0-protected, but that's probably where the elegance ends.

The biggest problem with SCIM is its clumsy data model, which then impacts the intuitiveness and friendliness of its API. I critiqued SCIM on InfoQ, and in response to a "put up or shut up" challenge from some of the members of the SCIM working group, I began working on an Internet Draft to propose a new distributed computing protocol, no less. That's a separate piece of work that should see the light of day in a couple of months.

In the meantime, I began to work on IAM at another organisation, a telco this time. My experiences with IAM at a bank, an insurance company and then a telco, had by then given me a much better understanding of Identity as a concept, and I began to see that many pervasive ideas about Identity were either limiting or just plain wrong. Funnily enough, most of these poor ideas had more to do with the Identity data model than with technology. I also observed that practitioners tended to focus more on the "sexy" technology bits of IAM and less on the "boring" data bits, and that explained to me, very convincingly, why systems were so clumsy.

I then consciously began to set down some data-specific tips and recommendations that I saw being ignored or violated. The irony is that it doesn't cost much to follow these tips. All it costs is a change of mindset, but perhaps that's too high a price to pay for many! In dollar terms, the business benefits of IAM can be had for a song. Expensive technology is simply not required.

So that's the lesson I learnt once more, and the lesson I want to share. No matter what changes we think are occurring in technology, the fundamental concepts of Identity have not changed. The data model underlying Identity has not changed. Collectively, we have a very poor understanding of this data model and how we need to design our systems to work with this data model.

So here are 50 data principles for you, the architect of your organisation's Identity Management solution. I hope these will be useful.

The presentation on Slideshare:

The document hosted on


dylbud said...

Hi Ganesh,

Can I ask you a question about Principle #10 (Identifiers are forever)?

It has been suggested by members of my team that, for shared external identifiers, this principle is really only true for the party or system that generates the shared identifier and publishes data to another domain for consumption by that domain.

For example, my system is receiving data from another system. We match records via a shared identifier that is generated and managed by that other system. If the other system chooses to delete an entity, and subsequently re-issues that same identifier for a different entity, my system must honor that, correct?

In other words, if the external system/domain is not following these principles correctly, we can't force them to.

Does this seem right to you? I suppose another more generalized way of asking this question is: must both Domains implement this principle in order for it to work?

Thanks in advance for your time,


Ganesh Prasad said...

Hi Dylan,

Sorry for the delay in replying. I only just saw your comment.

No, both domains do not have to implement this principle, although that would of course be ideal. In the next section after Principle 10 (on splitting and merging identities), you can see a mapping between internal identifiers and external identifiers. This is mainly to handle cases when you discover false positives and false negatives in your own data. But you can also use this mapping to mark one association as obsolete, with an optional end-date qualifier, and mark the latest mapping with a start-date qualifier. With this, you can uniquely identify two entities within your system, and you can also tell that their external identifier was such-and-such from a given time until a given time.

So even if you get a query from your counter-party domain quoting the recycled identifier, the date associated with that query will tell you which mapping entry to use to identify the entity within your domain.

Hope this helps.