Thursday, August 02, 2012

Best Practice For The Use of PATCH in RESTful API Design


This topic deserves standalone treatment. I wrote before in the context of SCIM that PATCH is a relatively recent HTTP verb that is being recommended for use in situations where a resource is to be partially modified. In comparison, PUT is a bit of a sledgehammer, because it requires the entire resource to be replaced in toto using a fresh representation.

So the advent of PATCH is a welcome development no doubt, but one immediate problem is posed by current web infrastructure. Not all infrastructure components understand PATCH. Firewalls are suspicious, proxies could mangle or drop messages containing the unfamiliar verb, not all web servers are geared to handle it, and so on. Heck, even the ultramodern HTML 5 spec has no support for the PUT and DELETE verbs, so asking for PATCH support is a bit ambitious at this point.

However, I'm not too worried about those problems because Time is a great infrastructure deployer. (Besides, there's nothing I can do to help on that front.)

The bigger problem I see is that we as API designers don't really know how to use PATCH. There is no guidance around its use other than to "use it for partial updates". Everyone agrees that we need more fine-grained control over updating resources, but this level of granularity needs detailed specification. We simply don't have that anywhere.

Well, we've got it now, and remember that you read it here first :-).

First off, let's take a JSON model of a resource because it's the most popular and everyone understands it. (Yes, I know, I'm one of the SOFEA authors, but I don't like XML either.)

You should know that the JSON family actually comprises two siblings, one good and one evil. Abel and Cain. Thor and Loki. Edwin Booth and John Wilkes Booth. Take your pick.

I'm going to call them JSON the Argonaut and JSON Voorhees (of Friday the Thirteenth notoriety).

JSON the Argonaut is the good sibling. He buys a ticket for every member of his family when he goes to see a show.

JSON Voorhees is the bad one. He buys a single ticket for himself, and gets all his kids to slip under the turnstile.

And you wonder why it's so hard to get good seats at the show.

You have no idea what I'm talking about, do you?

Check out the JSON website:

JSON is built on two structures:
  • A collection of name/value pairs. In various languages, this is realized as an object, record, struct, dictionary, hash table, keyed list, or associative array.
  • An ordered list of values. In most languages, this is realized as an array, vector, list, or sequence.

See the connection? The first structure describes JSON the Argonaut, who ensures that every member of his family has a ticket (a label or key). The second structure describes JSON Voorhees, who gets away with a single ticket (a label or key) for his entire family. The first thing we need to do therefore, before we even start looking at PATCH, is to fix JSON Voorhees. By which I mean, get him to buy tickets for all his kids.

This is what JSON Voorhees's family looks like when he slips them past with a single ticket:

This is what we want his family to look like when he buys all the tickets he needs to:


You see? One label per value.

In other words, when working with JSON objects, convert all arrays to dictionaries.

[If the sequence of items in the array is important and you don't want to lose that when turning it into a dictionary, generate keys that are in sequence, only taking care to see that there are enough "gaps" between keys so that new items can be inserted between them later, if required. Something like a B-Tree algorithm will be useful, I think.]

When we POST a resource containing arrays to a server, the server should convert them into dictionaries and return the new representation of the resource to us. All the generated keys should be in the new representation. And this is Goodness with a capital G, because now, all of a sudden, PATCH has a much easier job to do.

[Why can't we just use positional indexes, e.g., email-addrs[i]? Think about what will happen when one client does a GET and decides it needs to delete email-addrs[1]. In the meantime, another client deletes email-addrs[1], resulting in email-addrs[2] becoming the new email-addrs[1]. Then the first client will end up deleting the wrong attribute. Yes, you can check the Etags and the if-not-modified headers before an update to stop this from happening, but you will then end up with a large number of failed updates under high concurrency, which can be avoided with unique and stable identifiers.]

Now PATCH has a unique handle on every attribute value within a resource, no matter how complex or how deeply nested it is. Every value has a unique key that can be expressed as

first.second.third.fourth.fifth

if it's (say) five levels down from the top level resource URI.

Now we can use the same REST concepts that we had before PATCH, to operate on each of these. The coarse-grained verbs used to operate on resources can now be used to operate on parts of a resource as well.

POST to add a new attribute to a collection
PUT to replace the value of an attribute
DELETE to make an attribute inaccessible (and not necessarily delete it physically, of course)

In place of a URI, we specify the key of the attribute within the resource, using the dot-separated notation we just showed.

There's just one small detail. Actually two.

1. PATCH is turning out to be an operation container rather than an operation in itself. This means we're going to have to nest other verbs within it to operate on different parts of a resource. There's nothing to stop us from bundling operations to add, modify and delete various parts of a resource whose top-level URI the PATCH request targets. In fact, that's probably the kind of elegance we're after. But then we wouldn't want any confusion between these nested operations and the coarse-grained HTTP verbs used to operate on resources, so we should perhaps look for equivalent verbs to POST, PUT and DELETE to use inside of PATCH.

2. When we think about it, the semantics of the HTTP PUT verb are a bit unclear (at least to me). Do we intend to replace a resource that is already there, or do we intend to create a new resource with a URI of our own choosing (as opposed to a server-decided URI)? And what constitutes an error in either case (because they're the exact opposite conditions)? We probably need to split PUT into its subtypes to reflect these separate "create" and "update" semantics, and for good measure define a third subtype to cover the "create-or-update" case, which is also very likely to be required. While we're at it, we may want to narrow the scope of POST to only mean "add to a collection", because POST as it's used today is a bit of a catch-all command in REST when none of the others quite applies.

So with these two drivers in mind, let's look at what new verbs we could use as nested operations within a PATCH request.
  1. INCLUDE (equivalent to POST): Add the given attribute value to a collection and return the unique key generated for it.
  2. PLACE (equivalent to one form of PUT): Add the given attribute key-value pair at the location implied by the key. (If there’s already an attribute with this key, return an error status.)
  3. REPLACE (equivalent to another form of PUT): Replace the current value of the given key with this new value. (If there’s no such key, return an error status.)
  4. FORCE (equivalent to a third form of PUT): This means PLACE or REPLACE. (At the end of this operation, we want the specified key to refer to the given value whether the key already existed or not.)
  5. RETIRE (equivalent to DELETE): Delete, deactivate or otherwise render inaccessible the attribute with the given key.
  6. AMEND (equivalent to PATCH): (This verb is just listed for completeness. We probably don’t need a nested PATCH since PATCH cascades to every level of the tree.)
And here's a picture that illustrates far more effectively than a thousand words how PATCH will then work. Click to see the enlarged version. (There's a minor syntax error in the PATCH command. Each element in the "operations" array needs to be surrounded with curly braces.)


The final piece of the puzzle is the PATCH response.

If PATCH is to be used as an operation container with multiple nested operations, then the most appropriate status code is the WebDAV-defined "207 Multi-Status". Each nested operation must have its own status code, because some may succeed while others fail. Even with successful operations, there is a difference between "200 OK" and "201 Created", because with "201 Created", we expect to find the server-generated key accompanying the response.

Here's what the response to the above PATCH command could look like.


(Like with the PATCH request, there's a minor syntax error in the PATCH response. Each element in the "results" array needs to be surrounded with curly braces.)

To sum up,

  1. Ensure that when resources are created, all their array-type elements are replaced by dictionaries. Every value should be independently addressable using its own unique key.
  2. Use one of 5 different operations (INCLUDE, PLACE, REPLACE, FORCE and RETIRE) within PATCH to refer to various keys and specify new values. There is no possible confusion between the scope of these verbs and those of the HTTP verbs POST, PUT and DELETE, and their meanings are also unambiguous.
  3. Use the "207 Multi-Status" code in the response to the PATCH itself, and nest the status codes for each individual operation inside the response.
And that, in a nutshell, is what I hope will be considered Best Practice around the use of PATCH!

7 comments:

Hendy said...

Thank you Ganesh ! What a wonderful spec !

Some notes :

1. I'm not entirely convinced of converting all arrays to maps. Navigating arrays can be done via the usual "[i]" syntax. And not all arrays are the same, some arrays are treated as a single attribute value (no need to operate on individual elements inside array). An example would be [latitude, longitude, elevation] or [x, y, z] attributes. (yes, these are tuples, but JSON doesn't recognize tuples, does it?)

2. Would you be so kind to provide an improved version of PATCH with idempotent semantics? i.e. optimistic updates with criteria matching.

For example:

Make sure account.balance is currently 200,
then: Update account.balance to 400
else: Error (inconsistent data)

Hendy said...

How to distinguish between a single result, multiple results, an error, multiple errors, and metadata (e.g. count, available entities/attributes/object structure) in a RESTful JSON response ?

prasadgc said...

Thanks, Hendy.

1. I don't believe we can use positional indexes to reference array elements because they are not stable. If you delete addr[1], then addr[2] moves up to become the new addr[1]. It won't work well at all when multiple clients are querying and updating a resource.

2. Idempotence as in POST can still be used. The standard headers such as Etags and if-not-modified can be used to guarantee idempotence. It's not specific to PATCH.

3. See the PATCH response in my example. Doesn't it cover all these cases?

Regards,
Ganesh

Andrei Neculau said...
This comment has been removed by the author.
Andrei Neculau said...

Nice post. A fresh look at PATCH is always welcome.

My 2 penny:

#1 "Yes, you can check the Etags and the if-not-modified headers before an update to stop this from happening, but you will then end up with a large number of failed updates under high concurrency, which can be avoided with unique and stable identifiers"

It looks like what you're after here is stale data that needs to be double-checked in order to trust it.

Here's a non-array-related situation:
your family has 2 cars, both available to sell but only one should be sold so that you can still move around. You and your son are selling them at the same time, your son sells one (http request), you sell one (http request).
Both requests succeed, because you have no if-match or if-not-modified checks - and surprise! No car to drive around.

The examples can go own.
This is not a matter of array indices vs object keys. It's a matter of fresh-data.

#2 in my world (hopefully the majority's too), HTTP requests are atomic. When I do a POST/PATCH/PUT, I intend all of my "commands" to be obeyed, or else to know why this fails. Otherwise, again you have a mess.

With that said, PATCH as a batch verb.. does not appeal to me whatsoever.

#3 I do see a "evil JSON" though, for some update cases, but
https://github.com/algesten/jsondiff
and even https://github.com/benjamine/JsonDiffPatch
don't look bad at all

nprasanna said...

Does tomcat supports PATCH HTTPMethod, pls update

API HUB said...

PUT and PATCH are HTTP methods used for updating resources on a server. PUT replaces the entire resource with the new data, while PATCH updates only the specified fields. PATCH is preferred for partial updates, as it reduces network traffic and allows for more efficient resource management. Here you find detailed information on difference between put and patch.