Monday, July 09, 2012

A RESTful Protocol to Add/Modify/Delete Elements of a Collection


I've been reading the SCIM specification recently.

[What does SCIM stand for? There's a moral-of-the-story here, by the way. SCIM originally stood for Simple Cloud Identity Management. Now it stands for System for Cross-domain Identity Management. The moral of the story is, if you're swayed by the buzzword of the day, you will be forced into embarrassing retreat at some point. Cross-domain Identity Management was what it was always about. What a shame someone felt the need to jump onto the cloud bandwagon.]

Anyway, coming back to the topic of this post, I realised that one of the aspects dealt with by the SCIM specification was a much more general problem that would benefit from a standard approach.

I was struck by the similarity of 3 types of operations in the SCIM protocol, though not impressed by any of them.

One set of operations deals with adding, modifying and deleting multi-valued attributes of a user, e.g., the set of addresses or email IDs.

A second set deals with adding, modifying and deleting users as members of a group.

SCIM uses a RESTful API, so for these sorts of partial updates to a larger entity, it uses PATCH. A single PATCH command can be used to simultaneously add, modify and delete attributes of an entity, or to add, modify and delete members of a group. [PATCH when used to update one or more single-valued attributes is fairly straightforward, so I'm not talking about that here.] I suspect that PATCH is so new that best practice around its use has yet to emerge, which accounts for some of the ugliness I'm about to describe.

A third set of operations is called 'bulk operations', and deals with a collection of operations on multiple entities, which could include add, modify and delete operations. This is physically formatted as a POST with nested POST, PATCH and DELETE operations as attributes within the document.

To my mind, these three situations are conceptually the same (a set of fine-grained operations grouped into a single composite operation), therefore the same mechanism should ideally be used for all of them.

As it turns out, SCIM uses many different mechanisms, all ugly.


(In the spec document linked above, scroll up and down a bit to search for these examples)

Some of the ugliness comes from passing 'operation:"delete"' as an attribute of the document. After arguing for years that a GET request with URL parameter "?operation=delete" is not RESTful, we can't now barefacedly offer a "RESTful" API that does the same thing. True, this is not a GET, so it's probably not as awful, but this is a really bad "smell" that suggests poor design at some level.

The syntax to replace a user's primary email address with another one and simultaneously demote the previous primary email address to the status of just another email address is more than just ugly, it's horrific. Do have a look for yourself, if you can keep from gagging.


(In the spec document linked above, scroll up and down a bit to search for this example)

Deleting all members from a group is even uglier. There is nothing in the message to suggest that a deletion is taking place! There's a "meta" attribute that has a value equal to the set of attribute names whose (multiple) values are being deleted. If that confuses you, you now know what I mean.


The main operation is a POST to an arbitrary URI which could be called "/bulk" or something similar.
There are nested operations within it that reflect operations on various entities.

As you can see, these are 3 entirely different mechanisms for essentially the same desired behaviour. Can't we use the REST philosophy of polymorphic verbs to make this interface more uniform?

I would like to propose a simpler scheme that uses explicit identifiers for every sub-entity that may need to be independently manipulated (updated or deleted). This is similar to having resource identifiers, but at finer levels of granularity.

All three sets of operations should have a common structure like this:

PATCH /main-entities/29d70c13-8651-41ce-97f8-c82ccdfcd6d6
(or POST /bulk-operations)
Host: example.com
Accept: application/json
Authorization: Bearer h480djs93hd8
Content-Length: ...

{
  operations:
  [
    {
      operation: "POST",
      URI: " /sub-entities",
      sub-entity:
      {
      ...sub-entity attributes
      }
    },
    {
      operation: "PUT",
      URI: "/sub-entities/dced46ca-b96d-44dd-b494-41e5de9405b6",
      sub-entity:
      {
      ...sub-entity attributes
      }
    },
    {
      operation: "DELETE",
      URI: "/sub-entities/12feb1db-8031-4a23-8810-bccfffb13f56"
    }
  ]
}

i.e., the nested operations should be RESTful operations in their own right, with standard HTTP verbs and standard URIs using the enveloping entity's URI as their base URI. How do we know what the URIs of the sub-entities are? The server should generate one for every array element that is created (multi-valued attributes of an entity), and return those URIs with the "201 Created" response, as well as in response to any GET on the main entity. That will then provide a handle for the client to specify fine-grained operations on the sub-entities.

I also don't like the SCIM convention of returning a "200 OK" even for a bulk request where different sub-operations may have had different outcomes. This is as bad as HTTP tunnelling in SOAP where the HTTP response could be a cheery "200 OK", but the message inside contains a SOAP fault.

There is a WebDAV status code for exactly this kind of situation, - "207 Multi-Status". That's what should be used.

HTTP 1.1/207 Multi-Status
Content-type: application/json

{
  operation-results:
  [
    {
      operation: "POST",
      URI: "/sub-entities",
      status: "200 OK",
      Location: "/sub-entities/c44fcc7b-8881-4f0f-b5fb-b2344615cc34",
      request:
      {
      ...sub-entity attributes repeated for verification
      }
    },
    {
      operation: "PUT",
      URI: "/sub-entities/dced46ca-b96d-44dd-b494-41e5de9405b6",
      status: "409 Conflict"
    },
    {
      operation: "DELETE",
      URI: "/sub-entities/12feb1db-8031-4a23-8810-bccfffb13f56",
      status: "404 Not found"
    }
  ]
}

This is a clean syntax, easily understood, and uniformly applicable when dealing with attributes or sub-entities of an entity, or with bulk operations. All it requires is for resources and resource collections to be identifiable at finer levels of granularity as well, i.e., attributes that are array elements.

I don't believe that's too much of an ask. UUIDs are cheap and plentiful, and the alternative is an ugly mess.

2 comments:

Phil Hunt said...

By main entity, are you referring to a user and or group?

By sub-entity are you referring to an attribute?

If so, that wouldn't be resource centric. You'd have to make multiple calls to change the state of a single entity to a particular desired state if multiple attributes are effected.

Similarly, if you have many many (>100K) of entities to update, a "batch" mode is desirable. That's really what bulk was intended for.

That said, I do agree the patch operation for mulit-valued attributes is just horrible.

prasadgc said...

Phil,

I'm using "main entity" and "sub-entity" in a generic sense to cover the cases of an entity with attributes and a group entity with member entities.

I'm explicitly suggesting that if an entity has a multi-valued sub-entity, then the sub-entity be modelled as a resource in its own right, otherwise updating or deleting one of those values becomes a nightmare.

Regards,
Ganesh