Tuesday, May 29, 2012

SOA and MDM - Friends or Enemies?

SOA (Service-Oriented Architecture) and MDM (Master Data Management) are two terms that are often encountered in Enterprise IT. I've found a fair amount of confusion over how these two disciplines overlap, if at all, and whether their principles are compatible with or antithetical to each other.

One of the comments I heard went like this
MDM subverts SOA principles because it connects implementation to implementation. It bypasses the business layer with all its rules and validation, and could create a massively coupled mess.
I have a more optimistic view. I think the very opposite is the case. Business rules should only apply when updating the source of truth for any data item. That is emphatically not being compromised by MDM, because MDM explicitly identifies sources of truth and replicas, and only controls the update of replicas whenever the corresponding sources of truth change.

In fact, MDM could help to clean up the mess that may exist with incumbent systems. Many current systems apply business logic redundantly wherever the same data is updated. But common sense will tell us that updating replicas should be a no-no, and putting a duplicate layer of business logic over it is not a solution. It's far better to remove the business logic altogether from around the update of replicas (shocking as it may sound), because MDM can provide a cleaner logic to govern the update, i.e., that replicas are never independently updated but only refreshed when the source of truth is updated. With MDM, business logic around updating a data item only needs to be in a single place - where the source of truth is updated.

I drew this diagram to try and show that SOA and MDM are complementary organising principles that an enterprise could use to advantage.

[There is sometimes the situation that two data stores hold different horizontal subsets of the same data, such as a database that holds details of Sydney customers and another that holds similar details of Melbourne customers. If these two data stores are mastered by different systems, it may be acceptable to duplicate the business logic that governs updates. However, MDM doesn't come into the picture unless the data is actually being replicated, such as when the Sydney database also holds copies of Melbourne customer data and vice-versa. There would be no need to apply business logic to this replicated data because it has already run that gauntlet once at the source of truth.]

Saturday, May 12, 2012

Tight Coupling in the TCP/IP Stack!

Scandal and embarrassment!

The TCP/IP stack of Internet protocols, the poster child for a layered architecture with well-defined responsibilities and interfaces that abstract out needless dependencies, has a dirty little secret that I just stumbled upon since I now work for a telco. But before I tell you what it is, a quick recap of how the technology works.

IP stands for Internet Protocol. Every device has an IP address. [IP version 4 has been the most common so far, and IPv4 addresses look like this: (or in hex, C0.A8.01.01). IP version 6 (IPv6)  addresses look like this: 3ffe:1900:4545:3:200:f8ff:fe21:67cf.]

The way the Internet works is by routing packets of information, hop-by-hop, from a source to a destination. Each node along the way knows, by looking at the destination IP address in a packet, how to forward that packet so it gets one step closer to its destination. So all that an IP network really has are routing smarts. It's the destination's IP address in a message that holds all the information required for it to reach its intended audience.

That's great when the sources and destinations of messages are fixed in location. They have a certain IP address assigned to them when they start up, and from then on, that IP address typically doesn't change until they next start up.

Mobile data devices (which include 3G mobile phones and later devices that use the packet-switched data network) have introduced a problem. Their IP addresses need to keep changing because they connect to different nodes (or cells, or towers) as they move, and it would play havoc with routing if they carried their original IP addresses around when connecting to new nodes. So fine, the technology allows for their IP addresses to change dynamically. However, the logical data connections that the devices establish need to remain for the duration of the session. There could be a download going on, for example, and an interruption of the connection will abort the download. Innovations such as "fast mobile IP" were introduced to mitigate the visible effects of the problem, but did not address its root cause.

The root cause lies in a rather ugly fact about IP addresses. An IP address confuses a device's identity with its location. A device's location keeps changing as it moves, but its identity does not change. A location is important to know where packets are to be delivered. But logical concepts like sessions need to be tied to a device's identity, not to its location. These are two different concepts, but a single mechanism (the IP address) has been chosen to implement both of them. As long as the location and identity did not independently change, the design flaw remained hidden. Now with data-enabled mobile devices, device location and device identity show themselves very clearly as two different things, and the conceptual limitation of the IP address has therefore been exposed.

That's the rationale behind the new protocol specification called HIP (Host Identity Protocol). HIP is meant to sit between TCP and IP. Normally, a TCP-level domain name is resolved by DNS to an IP address. A whole generation of IT professionals has come of age with this principle internalised as an axiom of How Things Work. HIP is a Copernicus or a Galileo challenging an established view. The Sun does not go round the Earth, after all. It's the Earth that goes round the Sun! That's going to take some getting used to. For a networking professional or a web architect, discovering that the venerable TCP/IP stack should actually be the TCP/HIP/IP stack is a bit like discovering that they're an adopted child. But however painful the realisation and readjustment, it's better that the truth be known.

Under the new proposal, a TCP-level domain name needs to be resolved by DNS to a logical HIP name, which then gets further resolved to an IP address! Now, if a device is moving, its IP address can keep changing, but its HIP name will remain the same. Therefore TCP connections need not be torn down and re-established. Sessions need not be dropped and re-created.

RFC 4423 (HIP Architecture) says:
In the current Internet, the transport layers are coupled to the IP addresses. Neither can evolve separately from the other.
There are three critical deficiencies with the current namespaces. First, dynamic readdressing cannot be directly managed. Second, anonymity is not provided in a consistent, trustable manner. Finally, authentication for systems and datagrams is not provided. All of these deficiencies arise because computing platforms are not well named with the current namespaces.
It goes on to say:
An independent namespace for computing platforms could be used in end-to-end operations independent of the evolution of the internetworking layer and across the many internetworking layers. This could support rapid readdressing of the internetworking layer because of mobility, rehoming, or renumbering.
Amazing, isn't it? We've been nursing a tightly-coupled serpent in our collective bosom for over 3 decades, and we didn't even know...

It's going to take a while for HIP to become part of the Internet ecosystem (if it ever will!) The power of entrenched ways of thinking could prove too powerful to allow a much-needed rationalisation.

The lesson for me personally is that if we don't architect a system right, we will live with its negative implications for a long, long time. Even the founding fathers of the Internet, geniuses as they were, were not perfect, and we can clearly see in hindsight how a conceptual blunder (a conflation of location with identity) has impacted us.

I do believe though, that even the current HIP proposal is making a blunder of its own by confusing identifiers with identity credentials. RFC 4423 says:

In theory, any name that can claim to be 'statistically globally unique' may serve as a Host Identifier. However, in the authors' opinion, a public key of a 'public key pair' makes the best Host Identifier. As will be specified in the Host Identity Protocol specification, a public-key-based HI can authenticate the HIP packets and protect them from man-in-the-middle attacks. 
From my own work on Identity Management, I have come to realise that multiple sets of credentials can be used to arrive at, or establish, an identity. The establishment of an identity within a given context requires an identifier. This identifier may be the credentials themselves, or something else. It's important to realise that the "may be" should not be taken as a "must be". For the purpose of security, the authors of the HIP specification are proposing that verifiable credentials be used as the identifier in all situations. I fear that will result in a similar problem later on when the requirements of authentication and identity establishment diverge in some context. I'll write to the committee explaining my concerns.

Tuesday, May 08, 2012

The Unholy Alliance of Analyst and Big Vendor

I was browsing IT news as usual and came across a commentary page that discussed Progress Software's recent decision to divest itself of several core products, including Sonic ESB, Savvion BPM and Actional Service Management. To tell you the truth, I wasn't paying attention to the website where this page was hosted. But then I came upon this statement:

The situation reminds us of a key benefit of selecting a top-tier enterprise vendor: IBM and SAP rarely kill established products, and the same can be said of Oracle in recent years.

I've been in the industry long enough to detect a "this-message-brought-to-you-by-your-friendly-neighbourhood-commercial-sponsor" marketing insert when I see one. So I glanced up at the website, and sure enough, it was one of the big analysts, Forrester to be precise.

The funny thing is, I would have drawn a very different inference from the news about Progress.

This news confirms a few things I already know:

1. Commoditisation continues its onward march in product category after product category. This is good news for customers because it means lower prices as well as standardisation of features (which in turn leads to greater interoperability and lower operating costs).

2. Commoditisation is dreaded by vendors because it erodes their profit margins. The large vendors have enough of a lock on their customers through network externalities that they can often maintain their profit margins in spite of commoditisation. But smaller commercial vendors must either exit these market segments or accept lower margins.

3. The unspoken trend in any modern-day commoditisation story is the rise of Open Source. After all, Progress is not quitting these middleware markets because of competition from the big vendors. The pressure on them is from below, from Open Source. And there is a thriving market here in the "supported Open Source" category, as this gleeful blog post from WSO2's VP of Technology Evangelism shows.

I'm disappointed (but not surprised) that a big analyst is spinning what is really a welcome story of commoditisation into a warning to customers to buy the big expensive brands, or else. I guess there's no money in it for them to recommend that customers choose Open Source alternatives to start with.

But really, chee, how low can they go?

Mentoring NoSQL From Adolescence to Maturity

NoSQL is more than just the flavour of the month. There's no doubt that it's here to stay. But the movement is now experiencing growing pains.

In short, most variants of NoSQL have established a niche for themselves by creatively dropping one or more of the ACID constraints of traditional relational databases. Consequently, what they've gained on the swings in terms of features, they've lost on the roundabouts. Today, some of those shortcomings are becoming pain points, and the respective projects are attempting to layer the missing features on top of their existing products.

An expert view is that this is the wrong approach to take. Databases are complex beasts, and their features cannot be layered on, but engineered concurrently. It's a non-intuitive insight, but one that the greybeards of the industry have learnt through hard experience over decades.

It's very timely that one of the most respected names in the database field, IBM Fellow C Mohan, has stepped up to provide much-needed leadership and guidance to the NoSQL movement. His initial analysis and critique of NoSQL is on his blog (Part 1, Part 2, Part 3 and Part 4).

Mohan has promised to study individual NoSQL databases in more detail so as to understand their design nuances better. If he can then propose ways for these projects to enhance their capabilities in the most effective way, he would have succeeded in enabling a whole new wave of applications.

Here's wishing him all success!