The Wisdom of Ganesh: Idempotence - Reliability that Works

Tuesday, March 16, 2010

Idempotence - Reliability that Works

In many cases of messaging where uncertainty is undesirable (e.g., a timeout on a critical transaction request), the usual solution sought is "reliable message delivery". Apart from its theoretical impossibility, "reliable messaging" is expensive. One needs "strong queuing" products, which cost a lot of money.

I've been a fan of a far simpler approach - idempotence. We're really not after reliable message delivery in most cases. We'd often be happy to settle for certainty, i.e., we don't mind whether a requested operation succeeded, failed (or indeed, timed out before it could be attempted), as long as we know what happened. What we don't want is to be stuck, not knowing what happened and afraid to retry the operation for fear of unwitting duplication.

Idempotence is, of course, the property that attempting something more than once has exactly the same effect as attempting it once. Idempotent operations can be blindly retried in a situation of uncertainty, because one is guaranteed that the operation will never be duplicated.

The trick is to identify every transaction request with a unique identifier. Provided the receiver of the message is set up to check the identifier against previously-used ones, duplication of transactions can be avoided even if requests are sent multiple times. This is extremely powerful because it allows a requesting application to simply retry the request message in situations of uncertainty until a definite response is eventually received. There is no danger of the request being acted upon more than once.

Reliability is then reduced to an endpoint-based protocol. It does not require any special capabilities on the part of the transport. In fact, the transport can afford to be quite unreliable. Idempotence allows reliable messaging solutions to be built (and quite cheaply at that) on top of unreliable components!

Here's a one-page document that illustrates the concept.

Hopefully this should make it very clear that we don't need strong queuing or "reliable message delivery" to eliminate uncertainty. A plain web server, a database and a system of one-time tokens (UUIDs?) can solve the problem.

3 comments:

Unknown said...: Hi Ganesh, first of all thanks for your valuable posts. I love simplicity and in many cases a good infrastructure appliance can help a lot. I know you heard about Terracotta. I was impressed by their simple and elegant solution for queue handling: http://blog.markturansky.com/archives/26
What do you think about it?
Bye Michele; 23/3/10 02:49
prasadgc said...: Thanks for the link, Michele. That looks really impressive, and I think I'll have to read it a few times to understand it fully.

It's a 2 year old article, so I wonder if there has been further progress...

Ganesh; 23/3/10 05:10
Unknown said...: Terracotta made many things in the last two years but the key point is the same: "JVM-Level Clustering".
That's what makes me feel it's a great idea/product. I share most of your thoughts about the "EJB/JSF cancer" and this product could handle a situation where the developing phase is as simple as possible (a PC with Eclipse+Tomcat) and at deployment time you use an infrastructure software (Terracotta) to solve high availability problems.
By the way... I'm not a Terracotta reseller and I've never been able to use this technology because "They can't hear you"!!!
Michele; 25/3/10 01:07