Wednesday, June 09, 2010

The Real and Imagined Limitations of REST


One of the things that struck me about my discussion with JJ Dubray on a previous blog posting was how closely we agreed on fundamental architectural principles (decoupling of interface from implementation, avoiding a hub-and-spokes architecture, etc.), yet how diametrically opposed our views were on REST.

For example, I think REST does a great job of decoupling interface from implementation. JJ feels the exact opposite. Why?

Analysing the problem more closely, I guess the common examples of RESTful interface design are partly to blame.

1. A URI like

http://myapp.mycompany.com/customers/1234

where 1234 is the actual id of the customer in my database, would be a horrible example of tight coupling, in my opinion. I believe URIs should be opaque and meaningless. I think designers should take care to create a mapping layer that decouples their implementation data model from their externally exposed service data model.

For example, I would prefer this to be exposed:

http://myapp.mycompany.com/customers/cb77380-7425-11df-93f2-0800200c9a66

There should be a mapping within the service implementation that relates the opaque identifier "cb77380-7425-11df-93f2-0800200c9a66" to the customer's id within the domain data model, i.e., 1234. That would be true decoupling.

Mind you, the earlier example of tight coupling is not a limitation of REST, merely bad application design.

[Incidentally, I think UUIDs have given the world a wonderful gift in the form of an inexhaustible, federated, opaque identification scheme for resource representations. Decoupling identity is a key part of decoupling domain data models (Pat Helland's Data on the Inside) from service data models (Data on the Outside).]

2. Another bad example is the use of "meaningful" URIs. Even though the following two URIs may seem obviously related, client applications must not make any assumptions about their relationship:

http://myapp.mycompany.com/customers
http://myapp.mycompany.com/customers/1234

In other words, a client application must not assume that it can derive the resource representation for the set of customers by merely stripping off the last token "/1234" from the URI of an individual customer.

And this is not a limitation of REST either. The HatEoAS (Hypermedia as the Engine of Application State) principle says that client applications must only rely on fully-formed URIs provided by the service to perform further manipulations of resources. In other words, URIs are to be treated as opaque and client applications must not attempt to reverse-engineer them by making assumptions about their structure.

Examples like these are bad from the perspective of architects who understand the evils of tight coupling, and the ones who produce these examples understand it too, but these naive examples have the benefit of being easy to understand.

A RESTful system would work perfectly well with URIs that looked like these:

http://myapp.mycompany.com/51bb2db7-b51c-4fd9-b235-2b310a644a84
http://myapp.mycompany.com/
a7ac1f23-6bc7-44be-af9b-4b68fd31879f
http://myapp.mycompany.com/
208657ef-1e86-47ba-9c5f-3f177c018293

The first is a URI for a customer, the second is a URI for a bank account and the third is a URI for an insurance policy. How about that? This would be architecturally clean, but most REST newbies would go, "Huh? Weird!" and turn away with a shudder.

Sometimes, the desire for understandability of a design by humans introduces semantic coupling that is anti-SOA. I guess there's a trade-off that we need to be aware of, but it's not a limitation of REST itself. It's an application designer's choice.

3. In another context, JJ has expressed his opinion that SOA is about asynchronous peer-to-peer interaction, and I completely agree. Where I think he has misunderstood REST is in its superficial characteristic of synchronous request-response. There are several well-known design patterns for asynchronous interaction using REST, so in practice, this is not a limitation at all. The HTTP status code of "202 Accepted" is meant for precisely this condition - "I've received your message but have not yet acted on it". At one stroke, it's also a model for reliable message delivery. Combine a POST with a one-time token to ensure idempotence, then keep (blindly) trying until you get a 202 Accepted response. Voila! Reliable message delivery.

I find I am able to look beyond the superficial characteristics of REST that seem like showstoppers to JJ. I can see simple workarounds and Best Practice guidelines that can make a RESTful application fully compliant with SOA principles. JJ stops at the characteristics as given and concludes that REST is horribly antithetical to SOA.

6 comments:

Integral ):( Reporting said...

Ganesh,

thanks for reaching out. Unfortunately the architecture and the semantics of REST are fairly clear.

SOAP assumes one endpoint for a series of operations. If I want to change the location of my code (in case my company get bought or if I deliver a new system with some backwards compatible APIs), the client just changes the endpoint and voila, it is ready to work.

REST on the other hand assumes a very large number of endpoints (several per data records). As you want to change the location of your data, you cannot simply instruct your consumer applications to aim at a different endpoint (often because they use it as an ID to retrieve data). You can use the redirect capabilities of a proxy, but can you imagine the nightmare to manage the redirect for large numbers of records and complex data models? REST is clear, you should use bookmarks to enter a URI space and you should never ever store an "ID" on the consumer side.

The examples you are taking are very simple, yet the real interesting ones are the ones that involve complex relationships:

/customers/{guid}/contacts/preferred/{p_guid}
/customers/{guid}/contacts/regular/{p_guid}

let's say now that you give up on the idea of having preferred and regular contacts. What happens to these IDs?

/customers/{guid}/contacts/{another_guid}
...

what happens to the first URL? is it now invalid? The problem that REST creates is that the identity of the record is coupled (strongly, very strongly coupled) to the way you access the record (query/update). This is terrible.

How does versioning works in REST?
Is /v1/customers/{guid}/contacts/preferred/{p_guid} a good strategy? Obviously no, right? because all the sudden all the "ids" hold by the consumer become invalid when you go to v2.

How do you implement forwards compatible versioning strategies (http://www.ebpml.org/blog/217.htm)? Yes Content negotiation kind of works, but it multiplies the number of application types. Plus, you have to have one content type for every freaking URL that you have, a typical Data model will have hundreds, possibly thousands of little message types, for no other reason that REST couples access and identity.

With respect to PUT and POST and their semantics, please see my post: http://www.ebpml.org/blog/213.htm
There is nothing new here. Either you CRUD (which I hope we agree is very bad in terms of coupling) or we encode traditional actions into PUT and POST statements, yet again with an infinite number of URIs.

If you want to see how different REST is from SOAP, please see this other post: http://www.ebpml.org/blog/214.htm
I have highlighted in yellow and green how different the developer experience is with SOAP and REST. :-) Sooo different... You can also see that there is not a single REST framework where the implementation is decoupled from the implementation (as you seem to imply). The URI is bolted onto the object method without the loose coupling provided by a mediation pattern (as supported by traditional and federated ESBs).

The sad reality, is that REST brings nothing new, nothing new at all. I can appreciate with Amazon, Google, Yahoo... are using it for scalable, small footprint, slow changing APIs. But for IT, REST is a disaster of cataclysmic proportion.

In the end, I have no vendor ties, I am not emotionally attached to any standard. I am just an observer of the terrible waste created by these stupid wars where no one wins. We should only speak abstractly about architecture principles and let people choose the best way to implement them for their solution and their context.

In the end there are no silver bullet and if people were to think at the abstract level instead of the technology level, we will all win and we would have an objective way to discuss where principles apply and why one is better than another.

Ganesh Prasad said...

JJ,

There are many points here, but let's start with the basics.

You make this point ever so often, but I have still not understood what you mean by "coupling identity and access".

Could you please provide a simple example?

Regards,
Ganesh

Integral ):( Reporting said...

Ganesh:

sorry, I was in meetings all day. This is actually quite easy to understand.

REST was invented for the Web of pages. Tim Berners Lee set the foundation in 1989.

URIs are both "uniform" and "unique". I know U is for uniform, but a web page's "location" is unique. You can't have two pages at the same spot. You could have the "same" page at two different URLs, but logically, they are also two different instances even though they look the same. Uniform is super important of course because no one could imagine a Web where you would have to remember how to go to a page. The Web started there, URIs, pages and links. HTTP was kind of formalized on that foundation.

Of course, very quickly (NeXT's WebObjects was the first dynamic HTML framework and I saw it demoed by Steve Jobs at the Moscone center in 1995), very quickly people understood that the Web could do more than pages, data became the name of the game, dynamic HTML was the weapon you could use to kill competition. Imagine Amazon without Dynamic HTML? yet, that's were every one started, Yahoo? Google? ... nothing we see today would be possible without dynamic HTML, yet, REST was built on a completely different foundation.

The problem when you have data instead of a Web page is that Data needs two things: an identity, like a page, a key of some sort and of course it needs "access" because I don't really just get a page of data, I slide it and dice it different ways and data has relationships. So people started to invent all kinds of URI syntaxes to bake in "access semantics". When you say GET /customers?age>35 you are actually "accessing' data, this URI is not an Identity. Sure enough, the RESTafarians said, easy, there is something that is called a Resource which representation is all the customers older than 35, but this is not solving the problem of the coupling between Access and identity. Access changes, identity never changes. In the case of a Web page, Access never changes: GET, PUT, DELETE is all you do to a Web page. Acces is actually uniform. In the case of Data, business rules changes. Let's assume I can PUT customer data and all the sudden I come up with a new business rule. Physically, my "access" has now two URIs:
PUT /customers/v1/1234 and PUT/customers/v2/1234
but the identity is still /customers/1234.

This is the problem of coupling Access with Identity. Telling people that you GET X, then you PUT X or DELETE X is simply a fallacy. With data, this is simply not true. I am not to defend WS/SOAP, remember, I created WSPER - Web Service Process Event Resource to show that the solutions lies well above REST or SOAP), in WS/SOAP you don't have a uniform identity :-) you create your own identity scheme, that's not good, uniform identity is good, however, you have the ability to associate two endpoints to two different sets of business rules. So neither REST nor WS/SOAP works.

But in the end, I am sorry, the programming model associated to REST, simply does not work. REST is not a programming model: this coupling between Access and Identity is a terrible flaw. I can sure bake in some programming semantics in a URI syntax, but it is vastly incomplete. This is why people like Microsoft created OData. They realize that really all you can do in a general way is CRUD data, just like you CRUD web pages. The v1 and v2 business logic is above the REST interface in that case. I hope that we agree that this is bad. The business logic that is reusable at the resource level is precisely the resource lifecycle (http://www.ebpml.org/blog/30.htm) and OData assumes that business logic is above, in the consumer, implemented in a non RESTful way.

Hope this is clearer.

JJ-

Ganesh Prasad said...

JJ,

There are a few concepts we're dealing with here, and we need to take them up separately. I'll deal with one of them in my next post - your belief that REST is nothing but Distributed Objects.

Regards,
Ganesh

Integral ):( Reporting said...

Yes, this is a key concept too. REST is seen by many (such as Steve Vinoski, Bill Burke or Stefan Tilkov) as "CORBA done right".

JJ-

Tiago Silveira said...

... and then there's debugging.

going through pages of opaque URLs in logs can be tough.

But yes, even if humans make assumptions about what URLs refer to and relationships between two identifiers, machines shouldn't.