The Wisdom of Ganesh: 2007

Friday, December 28, 2007

The Anti-Mac Interface and SOA

SOAP and REST - I must sound like a cracked record.

An area I've started exploring is expressive power. Which paradigm allows us to talk about a domain in a richer way?

Many years ago, usability gurus Don Gentner and Jakob Nielsen wrote a provocatively-titled article called The Anti-Mac Interface. They hastened to explain that they were fans of the Mac, and that their paper was an attempt to explore what the human user interface would be like if each of the Macintosh's design principles was systematically and deliberately violated.

I will quote a couple of paragraphs from that paper here, especially where they talk about the limitations of the friendly point-and-click paradigm.

The see-and-point principle states that users interact with the computer by pointing at the objects they can see on the screen. It's as if we have thrown away a million years of evolution, lost our facility with expressive language, and been reduced to pointing at objects in the immediate environment. Mouse buttons and modifier keys give us a vocabulary equivalent to a few different grunts. We have lost all the power of language, and can no longer talk about objects that are not immediately visible (all files more than one week old), objects that don't exist yet (future messages from my boss), or unknown objects (any guides to restaurants in Boston).

Note three important classes of objects that the "see-and-point" paradigm is unable to cater for:

1. resources that are not immediately visible
2. resources that don't exist yet
3. resources that are unknown

I've deliberately used the term "resources" there. Open-ended question: does REST's insistence on identifiable resources expose it to an analogous shortcoming of the kind Gentner and Nielsen have identified with GUI interfaces?

Gentner and Nielsen go on to say

If we want to order food in a country where we don't know the language at all, we're forced to go into the kitchen and use a see-and-point interface. With a little understanding of the language, we can point at menus to select our dinner from the dining room. But language allows us to discuss exactly what we would like to eat with the waiter or chef. Similarly, computer interfaces must evolve to let us utilize more of the power of language. Adding language to the interface allows us to use a rich vocabulary and gives us basic linguistic structures such as conditionals. Language lets us refer to objects that are not immediately visible. For example, we could say something like "Notify me if there is a new message from Emily." Note we are not advocating an interface is based solely on language. Neither does the interface have to understand full natural language. Real expressive power comes from the combination of language, examples, and pointing.

Again, I can't help wondering, does SOAP's free-format style give it greater expressiveness than REST with its deliberately constrained interface? Certainly the composition of services into processes backed up by rules engines provides the ability to formulate complex conditional expressions.

To continue with Gentner and Nielsen's delightful analogy, is homo restus then a caveman who has to point to something visible in his immediate environment and make one of four different grunts to indicate what he means? And is his opposite number a glib-tongued SOAP salesman unconstrained by language?

Interesting.

Thursday, December 27, 2007

No, Really. What is SOAP? What is BPEL?

Sometimes the answers to simple questions are profound.

I have been asking myself a couple of very simple questions for some time now, and although I initially didn't want to accept some of the answers I got, I guess I can't avoid their implications.

What is SOAP? I'm deliberately not looking at any of the SOAP spec documents for the answer. I want to know the value proposition of SOAP.

Similarly, what is BPEL (or WS-BPEL, if you prefer)?

I can think of two quick answers:

1. The value proposition of SOAP is interoperability.
2. The value proposition of BPEL is portability.

To my mind, interoperability is at least an order of magnitude more important than portability.

SOAP is the "wire protocol" in one view of SOA. Within the "cloud" or "SOA Fabric", the only things one sees are SOAP messages flying past. Nobody sees BPEL within the SOA cloud. BPEL is an implementation language for the execution of processes, just like Java is an implementation language for general-purpose business logic. It runs on a node. All interactions between that node and others are through SOAP messages. A BPEL process consumes SOAP-based services and in turn exposes itself as a SOAP service.

So is BPEL really all that important in the scheme of things? Wouldn't any other portable language do? I think I've stumbled upon a candidate language, and it's not Java.

Lately, I've been playing around with the WSO2 Mashup Server, and I'm increasingly getting the excited feeling of a kid who's somehow come into possession of a machine-gun. This is not a toy, folks. Are the adults aware of what this thing can do?

Most people seem to think mashups are a cute visual gimmick. The WSO2 guys themselves don't have the air of people handing out machine-guns. The examples they bundle with the server are classic mashup fare - TomatoTube, in which you take the top-rated movies from Rotten Tomatoes and mash the list up with trailers from YouTube, enabling you to see the trailers of top-rated movies. Very cute and harmless.

But now for the machine-gun bit. The development language offered by Mashup Server is JavaScript (think general-purpose programming language). JavaScript augmented by E4X (think really easy XML manipulation). Mashup Server hides SOAP very effectively, although its interfaces to the outside world are SOAP (SOAP 1.1 and 1.2, also REST/HTTP, but more about that later). SOAP is out there in the cloud. But here, within this processing node, it's just XML representations of data and JavaScript code to process it with, thanks to the why-didn't-I-think-of-it simplicity of E4X. Surely we can do more than mashups with that kind of power...

I've found myself thinking a disturbing thought: If no one sees BPEL in the forest, then does it really exist? What if BPMN-based process modelling tools spat out E4X-enhanced JavaScript code instead of BPEL? Would anyone know or care? Take the output of the modelling tool and drop the file onto a server. The process is ready to run. All external interfaces are SOAP-based, just like with a BPEL implementation. Got any problems with that?

There's more revolutionary potential here than in the Communist Manifesto. Another of the cutesy examples bundled with Mashup Server is a REST service. You can do HTTP GETs, PUTs, POSTs and DELETEs on a simple city resource to manipulate its weather data. Very harmless, but again, the developer has very simple JavaScript-based access to REST services.

So is the WSO2 Mashup Server the one that will bring balance to the Force? A powerful programming language. Laughably easy XML manipulation. Simple access to SOAP services and REST resources. Transparent publication of itself as a service or resource in turn. Isn't this the holy grail of service composition?

Content management, process orchestration, what's in a name? I'm beginning to think BPEL is dead. Its value proposition doesn't stack up anymore. SOAP still makes sense, but not BPEL.

WSO2 Mashup Server seems to be the industry's best-kept secret for now. Keep the safety catch on, and watch where you point that thing.

Why RPC is Evil, Redux

Why am I blogging about this again? I talked about RPC being evil before.

Well, Mark Little referred to my post on Paying the RESTafarians Back in Their Own Coin and went into some detail on RPC and distributed systems in general. I'm sure that Mark (having a PhD in Distributed Systems) understands the fundamental issue with RPC, but since he doesn't make the point obviously enough in his post, it's possible that some of his readers may not get it. So let me illustrate my point with an actual example.

To recap: RPC is evil because it tries to make remote objects look local, and the illusion fails subtly in certain use cases that are quite common, damaging the integrity of the applications involved. When I say "object" here, I am not referring to objects in the OO sense, but anything that exists in memory, even an integer.

When I first learned C programming (in 1986 - man, has it been 21 years already?), I learned that swapping two numbers with a function call wasn't so straightforward. The following naive example doesn't work:

main()
{
int a = 5;
int b = 6;

printf( "Before swapping, a = %d and b = %d\n", a, b );

swap( a, b );

printf( "After swapping, a = %d and b = %d\n", a, b );
}

void swap( int x, int y )
{
int temp = x;
x = y;
y = temp;
}

The printout of the program will be:

Before swapping, a = 5 and b = 6
After swapping, a = 5 and b = 6

Clearly, the naive approach doesn't work, because C functions are called with pass-by-value semantics. This means that the swap() function is given copies of the two variables a and b. What it swaps are the copies. The original variables a and b in the calling function are unchanged.

The correct way to do the swap in C, it turns out, is to pass the memory addresses of the two variables to the swap() function, which then reaches back to swap the two variables by dereferencing the addresses:

main()
{
int a = 5;
int b = 6;

printf( "Before swapping, a = %d and b = %d\n", a, b );

swap( &a, &b ); // Note: we're passing the addresses of the variables now

printf( "After swapping, a = %d and b = %d\n", a, b );
}

void swap( int *x_p, int *y_p ) // The function receives the addresses of the variables, not the variables themselves
{
int temp = *x_p; // The asterisk helps to dereference the address and get at the value held at that location
*x_p = *y_p;
*y_p = temp;
}

Now the printout says:

Before swapping, a = 5 and b = 6
After swapping, a = 6 and b = 5

Subtle point: Note that pass-by-value semantics are alive and well, but now we're passing copies of the addresses. That doesn't matter, because when we dereference the addresses, we find we're looking at the original variables (multiple copies of an address will all point to the same memory location).

Now imagine that someone comes in offering to take my swap() function and run it on a remote host. They claim that they can make the two functions (the calling main() function and the swap() function) appear local to each other, so that I don't have to change either of them in any way. Is this claim credible?

This is exactly the promise of RPC, and if you believe the promise, you're in for a great deal of trouble.

What RPC will do is place a "stub" function on the machine running main(), and a "skeleton" on the machine running swap(). To main(), the stub function will appear like the original swap() function, and to swap(), the skeleton will behave like the original main() function. Between the two of them, the stub and skeleton will attempt the hide the presence of the network between main() and swap(). How do they do this?

The stub will quietly dereference the memory addresses being passed by main() and extract the values 5 and 6. Then it will send the values 5 and 6 over the network. The skeleton will create two integer variables in the remote system's memory and populate them with the values 5 and 6, then pass the addresses of these two variables to the swap() function. The swap() function will happily swap the two variables on the remote system, and the function call will return. Our printout will say:

Before swapping, a = 5 and b = 6
After swapping, a = 5 and b = 6

What happened? Didn't we pass the addresses of the variables instead of their values? Why didn't it work?

It didn't work because you cannot make pass-by-value look like pass-by-reference, and that's because memory references are meaningless outside the system they refer to. Think about this a little bit, and how we can solve the problem.

Let's say another person approaches me now, offering to host my swap() function on a remote system, but this time, they make it clear that only pass-by-value will be supported and I will have to deal with it. In this case, I need to make some changes to my main() and swap() functions. I first declare a structure to hold two integers in a separate file as shown below:

/* File commondecl.h */
// Declare structure tag for use elsewhere
struct number_pair_st_tag
{
int first;
int second;
};

I then use this common declaration to define a structure in both the main() and swap() functions:

/* File containing main() function */
#include "commondecl.h"

main()
{
int a = 5;
int b = 6;

// Define a structure to hold two variables
struct number_pair_st_tag returned_pair_st;

printf( "Before swapping, a = %d and b = %d\n", a, b );

returned_pair_st = swap( a, b ); // Call by value and receive by value

// Explicitly assign what is returned to my original variables.
a = returned_pair_st.first;
b = returned_pair_st.second;

printf( "After swapping, a = %d and b = %d\n", a, b );
}

/* File containing swap() function */
#include "commondecl.h"

struct number_pair_st_tag swap( int x, int y )
{
// Define a structure to hold the swapped pair
struct number_pair_st_tag swapped_pair_st;

// The swapping is more direct this time, no need for a temp variable.
swapped_pair_st.first = y;
swapped_pair_st.second = x;

// Explicitly return the swapped pair by value
return swapped_pair_st;
}

The printout will now read:

Before swapping, a = 5 and b = 6
After swapping, a = 6 and b = 5

The swap worked, and it worked because there was all-round honesty this time about the fact that variables would be passed by value. Yes, I had to re-engineer my code, but now my code will work regardless of whether the two functions are local to each other or remote.

A couple of points:
1. Isn't this wasteful for local calls?
2. Isn't this brittle? There's a shared file "commondecl.h" between the two systems.

Well, yes, of course it's wasteful for local calls. But seriously, how wasteful? We're talking about in-memory copying here, which is generally insignificant compared to the network latency we always incur in the case of remote communications. True, if objects are very large and the copying is done in iterative loops, the performance hit could become significant, but the fundamental issue is that correctness trumps performance. We would rather have systems that work correctly but less efficiently than systems that work incorrectly.

As for brittleness, that's what contracts are for in the SOA world. The "commondecl.h" file is part of the contract between the two systems. In fact, in the general case, we would be declaring both the passed and the returned parameters within such a contract, and we would expect both systems to honour that contract. Coupling systems to a contract rather than to each other is not brittle, it's loose coupling.

I hope these examples make it very clear why RPC in all its forms is evil incarnate. RPC is a dishonest mechanism, because it promises to achieve something that it simply cannot deliver - a convincing illusion that remote objects are local. This is one of the reasons for Martin Fowler's First Law of Distributed Objects ("Don't Distribute your Objects"). The opposite illusion, that all objects are remote (and that therefore, everything passed between them is a copy ("pass by value")) is achievable and honest, but could be marginally inefficient when two objects are local to each other.

System integrity is paramount, and therefore we can only use honest mechanisms, even if their efficiency is suboptimal. Messaging is an honest system, because it explicitly deals in copies ("pass by value"). Distributed systems should therefore be based on a messaging paradigm.

As I said at the beginning of this post, I have delved deeper into why XML-RPC is doubly evil and SOAP-RPC is triply evil here. Hopefully it should be clear now why I'm so much against SOAP-RPC and such a fan of SOAP messaging.

Wednesday, December 26, 2007

Domain-Driven Design Really Quickly

Domain-Driven Design (DDD) by Eric Evans is one of the hottest books going around at the moment in the software development community. I believe it’s a must-read for any serious application designer or architect.

This book is an important synthesis of good design practice evolved over many person-years of experience. At 576 pages, however, this is not exactly light reading, which prompted Abel Avram and Floyd Marinescu of InfoQ to produce "Domain-Driven Design Quickly" in about a hundred pages. While a godsend to the time-poor, this is still a significantly heavy document that would take a person 2 to 4 hours to read and comprehend. What developers need is a more gentle ramp-up into the book. Hence my effort to state the gist of the latter book in one page. I call it (what else?) Domain -Driven Design Really Quickly.

I don’t pretend that this one-page summary is a substitute for reading either book. I suggest reading this page first, then reading Abel and Floyd’s 100-pager, and then engaging Eric’s book itself. Each iteration will hopefully prepare you for the next without overwhelming your mind at any stage.

I apologise if I've omitted something really important in the process or put words into any author’s mouth that he didn’t intend. Let me know of any errors by leaving a comment here, and I'll put out a revised version.

There's also the DDD book by Jimmy Nilsson (Applying Domain-Driven Design and Patterns) which I recently bought but haven't read yet. Stay tuned and I'll post my thoughts on this book as well.

My (unsubstantiated) Snapshot View of SOAP/WS-* and REST

At the moment, I haven't got the links to back me up, but I thought it's important to capture the thought before it escapes me.

I think SOAP/WS-* is an ambitious 100% vision for SOA that is 60% implemented.
And I think REST is a pragmatic 80% vision for SOA that is more than 90% implemented.

That model then explains to me the mutual contempt of both camps. The SOAP/WS-* group criticises REST for an overly simplistic model that does not cover a number of use cases - notably Quality of Service specification (security, reliability, transactions), policy-driven service interactions, process definition, exploitation of "industrial strength" message queue infrastructure, etc. That's because the SOAP/WS-* brief is all-encompassing. They've bitten off a lot.

The REST group criticises SOAP/WS-* for exactly that. They believe that SOAP/WS-* has bitten off more than it can chew. REST has taken a modest bite of the SOA problem, and has produced tangible results. What has SOAP/WS-* delivered except complexity?

A year ago, the winner of this argument would have been REST. But the SOAP/WS-* camp has been steadily chewing away at what they've bitten off. And delivering, bit by bit. Two words - Tango and WSO2. SOAP/WS-* complex? Vapourware? Not anymore.

Now it's the REST camp's turn to be defensive about their incompleteness of vision. We're beginning to hear mumblings about WADL, HTTPR and message-level security. I detect an air of defensiveness in talk about transactions and process definition. My long years of experience dealing with commercial vendors tells me that when someone questions the need for a capability, it usually means their product doesn't have it (yet). (Once they have the feature, it of course becomes indispensible and any rival product without it isn't serious competition.) Many of the RESTafarian arguments against the "unnecessary" features of SOAP/WS-* give me a feeling of déjà vu.

And oh, while REST is ahead of SOAP/WS-* on implementation, it's still not at 100%. Do let me know, for example, when PUT and DELETE are finally implemented by browsers. Ouch.

I'm not gloating that a bit of the REST smugness has been punctured (OK, just a little bit). I'm happy that the two worlds are improving their levels of maturity. In 2008, I'll be looking for REST to expand their vision, and for SOAP/WS-* to deliver on their vision.

Tuesday, December 18, 2007

What ESB Is and Isn't (A Quick Mnemonic)

Even if you remember nothing else about ESB (Enterprise Service Bus) from my earlier rant, this should be sufficient:

ESB != Expensive Shrinkwrapped Broker
ESB = Endpoints (SOAP & BPEL)

As my old friend Jim Webber likes to say, the SOAP message _is_ the Bus. It's a piece of Zen-like wisdom, and it takes a while to get one's head around it, but once you "get" it, you'll never fall for vendorspeak again. I am so over centralised brokers calling themselves ESBs.

Thursday, December 13, 2007

Paying the RESTafarians Back in Their Own Coin

Though I like REST and consider it a very elegant model for SOA, it's a little tiresome to hear day in and day out that it's so much more elegant than the SOAP-based Web Services model. In fact, I'm getting so tired of this shrill posturing that I'm going to stick it to the RESTafarians right now, in their own style. Watch.

REST exploits the features of that most scalable application there is - the web, right? REST doesn't fight the features of the web, it works with them, right? HTTP is not just a transport protocol, it's an application protocol, right? URIs are the most natural way to identify resources, HTTP's standard verbs are the most natural way to expose what can be done with resources, and hyperlinks are the most natural way to tie all these together to expose the whole system as a State Machine, right??

Great. I'd like to introduce you to something far older, more basic and more extensible than your web with its URIs, standard verbs and hyperlinks. You may have heard of the Internet. That's right, The Internet. Do you consider the Internet to be scalable, resilient, supportive of innovation? Yes? Good.

This may sound familiar, but the trick, dear friends, is not to fight the Internet model but to work with it. And what is this Internet model? In a sentence, it's about message passing between nodes using a network that is smart enough to provide resilience, but nothing more. This means that every higher-order piece of logic must be implemented by the nodes themselves. In other words, the Internet philosophy is "Dumb network, smart endpoints" (remembering of course, that the network isn't really "dumb", just deliberately constrained in its smartness to do nothing more than resilient packet routing). And the protocol used by the Internet for packet routing is, of course, Internet Protocol (IP).

Voila! We've decentralised innovation! Need a new capability? Just create your own end-to-end protocol and communicate between nodes (endpoints) using IP packets to carry your message payload. You don't need to consult or negotiate with anyone "in the middle". There is no middle. The Internet is not middleware, it's endpointware. (Thanks to Jim Webber for that term :-)

What's TCP? A connection-oriented protocol that ensures reliable delivery of packets in the right order. Did the Internet's designers put extra smarts into the network to provide this higher quality of service? Heck no, they designed an end-to-end protocol and called it TCP. What does TCP look like on the wire? Dunno - they're just IP packets whizzing around like always. TCP packets are wrapped inside IP packets. TCP is interpreted at the endpoints. That's why the networking software on computers is called the TCP stack. Each computer connected to an IP network is an endpoint, identified by nothing more than - that's right - an IP address. Processing of what's inside an IP message takes place at these endpoints. The Internet (that is, the IP network) is completely unaware of concepts such as sockets and port numbers. Those are concepts created and used by a protocol above it. The extensibility and layered architecture of the Internet enable the operation of a more sophisticated networking abstraction without in any way tampering with the fundamental resilient packet-routing capability of the core Internet.

Time for a more dramatic example. What's IPSec? An encrypted end-to-end channel between two nodes that may be routed through any number of intermediary nodes. Wow, that must have required an overhaul to the network, right? Nope, they just created another end-to-end protocol called ESP (Encapsulating Security Payload) and endpoints that understood it. Then they "layered" it between TCP and IP. Can they do that? Of course! The Internet's architecture doesn't in any way force TCP to sit directly atop IP. The "next protocol" attribute of every packet in the "TCP/IP suite" allows virtually any protocol to be layered on top of any other. So TCP packets get wrapped inside ESP packets that get wrapped inside IP packets. At the receiving endpoint, when the IP envelope is examined, there's a little note there to say that the "next protocol" is ESP, not TCP. So an ESP processing layer is given the payload of the IP packet. When ESP is through with its decryption, it hands its payload to the "next protocol", TCP. And life goes on. Any higher-level protocol that runs above TCP will be none the wiser. Is this cool or what? And with all this smart processing going on at the endpoints, what do we see on the wire? Just IP packets whizzing around as usual.

Since you RESTafarians are so forthcoming about how the Web is superior to other forms of distributed computing, let me tell you a little bit about how the Internet is superior to other distributed computing platforms. Look at the old Telco network, called POTS (Plain Old Telephone System), the philosophical antithesis of the Internet - dumb endpoints, smart network. The telephone handsets provided by the Telcos are traditionally dumb, just dialtone-capable. All the smarts are in the network (the "cloud"). All the messagebanks, the call waiting, call forwarding, teleconferencing, everything is taken care of by the smart network. The Telcos claim that the model is least disruptive because they can upgrade capability without having to upgrade millions of handsets.

Is that so? Then how come the Internet has beaten the Telco network even in telephony? Made an overseas call using Skype lately? How much did it cost you? Did you use a webcam? Do the Telcos offer anything equivalent? I rest my case.

The Internet, with its "smart endpoints, dumb network" approach, has comprehensively beaten the Telco network with its "smart network, dumb endpoints" philosophy. It's a fact in plain view that cannot be denied. When innovation is decentralised, it flourishes. The Internet is a platform for decentralised innovation. (Heck, the very web that you RESTafarians wave in people's faces is an example of the innovation that the Internet enables. HTTP is an end-to-end protocol, remember?) You don't fight such a model. You work with it.

Now what does all this have to do with SOAP and Web Services? Everything.

Remember that when we talk about SOAP today, we're talking about SOAP messaging, not SOAP-RPC. Forget that SOAP-RPC ever existed. It was something invented by evil axe murderers and decent SOA practitioners had nothing to do with it ;-).

And another subtle point to remember is that when we say "SOAP message", we always mean "SOAP message with WS-Addressing headers". Why is this important? Because a SOAP message with WS-Addressing headers is an independently-routable unit, just like that other independently-routable unit, the IP packet. Begin to see the parallels?

Now imagine a messaging infrastructure that knows how to route SOAP messages and nothing more. We will deliberately stult the IQ of this infrastructure and prevent it from growing beyond this capability. How do we innovate and build up higher levels of capability? Why, follow the Internet model, of course. Create protocols for each of them and embed all their messages into the SOAP message itself (specifically into the SOAP headers). There's no need to layer them as in the TCP stack, because reliable delivery, security, transactions, etc., are all orthogonal concerns. They can all co-exist at the same level within the SOAP header block - headers like WS-ReliableMessaging, WS-Trust, WS-SecureConversation, WS-Security, WS-AtomicTransaction, WS-BusinessActivity, etc.

Now we have the plumbing required for a loosely-coupled service ecosystem. This isn't by any means the totality of what we require for SOA. At the very least, there's still the domain layer that sits on top of all this elegant message-oriented plumbing. So now define document contracts using an XML schema definition language and a vocabulary of your choice (for Banking, Insurance or Airlines), stick those conforming XML documents into the body of the SOAP messages we've been talking about, and ensure that your message producers and consumers have dependencies only on those document contracts. Ta-da! Loosely coupled components! Service-Oriented Architecture! (Note that just one language - XML - is being used to define both a core application wire protocol (SOAP/WS-*) and one or more application domain protocols (the document payloads). What does 'X' stand for in XML again?)

Now all we need is a bit of metadata to describe and guide access to these services (WSDL and WS-*Policy - some might say SSDL) and a recursive technique to create composite services (WS-BPEL - some might say SSDL again) and we're more or less done.

Sounds good, right? In fact, it sounds more than just good. This is revolutionary! But why haven't we seen this wonderful vision unfold?

A reality check. (And this is no apology or reason for RESTafarians to crow. SOAP-based Web Services technology is not a tired model about to be pushed aside by a simpler and more elegant rival. It's a sleeping tiger that is now stirring.) Why we don't enjoy the brave new world of SOAP-based Web Services today is because:

(1) Those axe murderers we talked about earlier misled the world with SOAP-RPC for a few years, and SOA practitioners are still being detoxified. (There's a school of thought that says the use of WSDL implies RPC even today, but that has more to do with the entrenched (and entirely unjustified) exposure of domain object methods as "services". See my later post on the Viewpoint Flip which is my recommended technique to service-enable a domain model. At a syntactic level, the wrapped document/literal style also distances SOAP usage from blatant RPC encoding.)

(2) Another group of petty criminals talked up SOAP's transport-independence but only wrote HTTP bindings for it, needlessly making the SOAP visionaries look silly. (I think when AMQP arrives - in its fittingly asynchronous way 20-odd years after TCP and IP, we'll begin to see TCP and AMQP as the synchronous and asynchronous transports that synchronous and asynchronous SOAP message exchange patterns should most naturally bind to. There's also the view that the sync/async dichotomy is artificial, and it is correlation which is key. Be that as it may, I don't see a need for SOAP to bind to HTTP at all, except for the unrelated constraint described next.)

(3) There's the pesky firewall issue that causes weak people to surrender to HTTP expediency, but I think that's another thing we will sort out in time. I think of SOAP as standing for "SOA Protocol". It's an application protocol in its own right, and it sits above infrastructure protocols like TCP, UDP and the coming AMQP. We need to define a port for SOAP itself, and get the firewalls to open up that port. No more flying HTTP Airlines just because we like the Port 80 decor. We are SOAPerman and we can ourselves fly ;-).

(4) Vendors of centralised message broker software have been pretending to embrace the SOAP/WS-* vision ("We're on the standards committees!") but in reality they're still peddling middleware, not endpointware. There's a whole flurry of vendor-driven "marketectures" intended purely to protect their centralised broker cash cows from being slaughtered by a slew (pun intended) of cheap, decentralised SOAP-capable nodes. Another round of detoxification needs to happen before people begin to understand that ESB is not an expensive broker you plonk in the middle of your network but a set of capabilities that magically becomes available when you have fully WS-* capable SOAP endpoints. This understanding will dawn, and soon.

(5) As long as commercial licensing of SOA products remains in vogue, deployments will tend to be centralised and will fail to exploit the fundamental Internet-style architecture of SOAP/WS-*. Who wants to pay for multiple instances of software when a single one will do? SOAP and WS-* have only made endpointware architecturally feasible. Open Source implementations will make it economically viable. Three projects to watch: ServiceMix, Mule and WSO2.

(6) The WS-* standards that define security and reliable messaging only recently arrived, so implementations are still being rolled out. But hey, it's here now. Take a look at any Tango-.NET interop demo and you'll be wowed by the policy-driven magic that happens.

I've put together a couple of diagrams that explain these basic concepts. This one illustrates the parallels between the SOAP/WS-* stack and the TCP/IP stack. This one shows my classification of the various SOA approaches in vogue today, with my biases clearly visible :-).

In short, I think there are only two valid approaches to building Service-Oriented Architectures today - (1) REST and (2) SOAP-messaging-using-Smart-Endpoints (and I hate both the SOAP-RPC axe murderers and the Centralised ESB marketing sharks for forcing me to use such a long qualifier to explain exactly what I mean).

And if all this didn't finally turn out to be the poke in the snoot of the RESTafarians that I intended, that's because I like REST. I just don't accept that REST is in any way superior to SOAP-messaging-using-Smart-Endpoints. As I hope I've explained in this lo-o-ong post, the latter's based on an equally elegant model - the Internet itself. As we address the essentially implementation-related deficiencies of the technology, SOAP-messaging-using-Smart-Endpoints will be seen as an equally elegant and lightweight approach to SOA.

Most importantly perhaps, the REST crowd will finally shut up.

Wednesday, December 12, 2007

Why Is REST (Seen To Be) Simpler Than SOAP-Based Web Services?

I had these points under another post earlier, but they really deserve their own post.

I really don't believe there is a huge difference between SOAP and REST or that there is a "war" between them. Yes, REST has been much simpler and more elegant than SOAP-based Web Services so far, but why do you think that's so? This is a multiple-choice question.

(a) There's still some basic confusion in the REST camp and elsewhere between SOAP messaging (the modern model of SOAP) and SOAP-RPC (the outdated view). What RESTafarians attack is the now-universally condemned SOAP-RPC model.
(b) The scope of SOAP/BPEL being much more than that of REST (Message Exchange Patterns, Qualities of Service, Processes, to name a few), the WS-* standards are understandably richer and more complex. As REST begins to move beyond the low-hanging fruit it currently targets, it'll get complex too.
(c) The vision of SOAP messaging hasn't yet been completely translated into reality - e.g., nominally transport-independent but still has no transport bindings defined other than HTTP, the unnecessary WSDL abstraction of "operation" rather than "message" as the basic primitive. Rumblings around AMQP and SSDL provide hope that the vision may yet find more optimal implementations.
(d) WS-* Standards bodies are political menageries. The spec writers have made things more complex than they need to be. Future iterations will shake out the bad specs.
(e) Things are actually settling down in the WS-* world. The core specs are done and dusted (WS-Addressing, WS-Security, WS-SecureConversation, WS-Trust, WS-ReliableMessaging). Implementations are much more interoperable. Tooling has improved. Policy-based configuration of services is becoming a reality (WS-PolicyFramework, WS_SecurityPolicy, WS-ReliableMessagingPolicy). A Tango demo will blow you away.
(f) All of the above.

I believe the answer is (f), which means that the REST folk can stop crowing. There's a place in this world for SOAP (SOAP messaging, that is), and it's right up there alongside REST. Neither is "superior". (In fact, both have areas for improvement, but I'll go into that in another post.)

Forget all the issues around complexity. I believe that in time, complexity will cease to be a differentiating factor between WS-* and REST. What will be left are two equally valid ways of looking at the world, and SOA practitioners will choose the view that is appropriate for the task at hand.

"Seven Fallacies of BPM" a Must-Read

An excellent post over at InfoQ. Jean-Jacques Dubray makes some very valid points about BPM. I agree with what he says, but my concern is less about the gaps between BPMN and BPEL and more about the duality of the REST (Resource) view and the SOAP/BPEL (Process) view. My thoughts began to go in that direction once more.

I've created a diagram to show how SOAP/BPEL and REST are two views of the same system.

If you don't agree with my last statement, what do you think of these two sentences: "The blacksmith hammers a piece of metal" and "The piece of metal is flattened into a plate"? Most people would agree they are both describing the same situation, one from the viewpoint of the "actor" (the blacksmith) using active verbs, and the other from the viewpoint of the iron itself using passive verbs. The first is operation-oriented and describes a process (SOAP/BPEL), the second is resource-oriented and describes a lifecycle (REST). Thanks to Jean-Jacques Dubray for nudging me towards that insight.

"XML" is a First Class DataType in PHP Too

Looks like Java is the odd man out, after all. It's not just JavaScript that has a simple and powerful way to manipulate XML (E4X). I just realised that PHP has such an extension too. It's called SimpleXML. Granted, it's not as powerful as E4X (no arbitrary depth double-dot operator, no attribute filtering, no wildcard asterisk operator), but it still provides enough simplicity and power to make XML processing seem not to be a pain.

Rajat is right. Java needs a datatype called "java.lang.XML" very soon, otherwise it may soon be abandoned by SOA practitioners in favour of PHP (or even server-side JavaScript, which the WSO2 Mashup Server folks are using with no apparent ill-effects).

Speaking of the WSO2 Mashup Server, those folk over at the community site virtually fall over each other to help newbies. I knew that Open Source communities were friendly, but these guys are something else. Thanks for all your help, Keith and Jonathan. I'm in shock and awe both at your enthusiastic support and the blistering pace of WSO2 development in general.

Sunday, December 09, 2007

"XML" Should be a First Class DataType in Java, or My E4X Epiphany

I've been playing with E4X for a few days now. I never realised that manipulating XML could be so easy. No heavyweight API like JAXB or its marginally simpler cousin JDOM. E4X is just so simple and natural, with XPath-like expressions being part of the basic syntax. A pity Java has nothing like this.

When I mentioned this today to my friend and frequent co-author Rajat Taneja, he had, as usual, just one comment to make, - a statement of astonishingly simple insight. "Java needs to have XML as a First Class DataType," he said, "You should be able to declare a variable as being of type "XML", then set it to an XML document or InputStream and it should even be able to validate itself against the schema that the document references."

I know that Java 6 has vastly improved support for XML processing, but that's by bringing JAXB into the language. It's not the same thing. What Rajat says is required is a "java.lang.XML" datatype. Without it, Java just cannot cut it in the brave new world of SOA. Strong words, I know, but my recent experience with E4X has convinced me that XML manipulation has to be dead easy, because the centre of gravity of the software industry has shifted away from implementation languages like Java and C# and towards representational languages like XML (italics represent my own terminology).

If you play around a bit with WSO2 Mashup Server, you'll see what I mean.

If BPEL is the language to orchestrate verbs (SOAP operations), I would put my money on E4X as the language to aggregate nouns (XML documents that are REST resource representations or part of SOAP responses). I believe that this is the destiny of portal servers too. They must turn into mashup servers to survive, and I think a language like E4X is what they need to aggregate content before converting it into a presentation format. The latter function is the mainstay of XSLT, but so far, we haven't had a clear candidate for the former, no universally accepted Content Aggregation Language.

Until now, that is.

I'm pretty sure that E4X isn't going to be the ultimate tool for XML processing, but it has broken new ground and represents the next generation in XML manipulation capability. It's exciting to imagine what new technologies and tools will follow in its wake.

What we need now is an architecture to guide this new technology through its infancy - a Service-Oriented Content Aggregation Architecture. Give Rajat some time :-).

Cooking with Leftovers - The Current Portal Model

Sometimes you just need the right analogy to describe a technology, and I believe I have found the one for portal technology.

Portals are sold as a lightweight integration capability. The vendor term for it is "integration at the glass", which sounds nice but doesn't explain much. I have a more descriptive phrase now, if less flattering. I think the portal model of application integration (or aggregation, if you don't want to dignify what portals do with a term that implies a degree of robustness) is more like cooking with leftovers.

Think about it. Here are two or more independent web applications that produce some HTML (or at any rate some presentation markup), and it falls to the portal to pull them together into a single web page to make them appear like part of a larger, composite application. To my mind now, it appears that this enterprise is doomed from the start.

The first reason is that we cannot adopt this model of aggregation for any existing (read: standalone) web applications. The applications need to be specially built to work within a portal environment. They need to understand that they are not standalone applications but are part of a larger environment that they may share with others like themselves, although what that larger environment or those other applications may be, they must be entirely ignorant of.

This is actually quite a big ask. The most pressing need for lightweight aggregation of the portal kind is to tie together existing web applications, and this is something that portals cannot do, because they require their constituent applications to be written to the portal model, i.e., to be portlets. Portlets do not produce full-fledged web pages, only fragments of HTML such as <div> or <span> elements. They must also conform to the portal event model and be able to respond to "render" and "update" ("processAction") commands.

The second reason why I now believe the portal model has been doomed from the start is that all we need is a better model to come along, and the portal model will be exposed for what it is, - a clumsy attempt to hitch an application wagon to two (or more) independent HTML horses that have already bolted from the content stable (Oh man, am I pleased with myself about that analogy :-).

If we're going to have to develop our constituent applications afresh using a new paradigm, and a more elegant paradigm comes along that allows us to aggregate diverse bits of content before they are cooked into a presentation format, then wouldn't we much rather use adopt that paradigm? By "content", I'm referring of course to the output of a SOA-style Service Tier, a non-visual representation of application state.

In other words, who would want to cook with leftovers from previously cooked dishes when they can have access to fresh ingredients and complete freedom to mash them up in any way they choose?

Wait a minute! Did I say "mash them up"? Because that's exactly the paradigm we have before us today - content aggregation through mashup technology.

I'll be recording my thoughts on mashup technology here in the weeks to come, but for now, let me just predict that if portal servers do not offer mashup capability very soon (in addition to supporting legacy portlets), they will lose out to pure-play mashup servers.

Saturday, November 24, 2007

An Open Letter to the New Australian Prime Minister, Kevin Rudd

Congratulations on your election victory, Mr. Prime Minister! (I think I can safely address you as such even though you haven't been officially sworn in yet).

I would like to get a quick word in before you are deluged by work and a cacophony of voices representing the spectrum of Australian interests.

Your campaign promised Australia a world-class education system, with an emphasis on "digital schools" and a computer for every child in Years 9 to 12. I laud your vision and extend my support and best wishes towards its successful implementation.

I must however caution that to bureaucrats who haven't kept up to date with the latest developments in the computer industry, implementing your policy may seem to mean nothing more complicated than signing a multi-million dollar agreement with Microsoft, but there is a far more efficient use of taxpayers' money.

I would urge the Labor government to look closely at the Open Source movement not only to provide basic operating and application software for school computers, but also to provide access to courseware through an "educational commons." When you do the numbers, you will find that very significant savings can be obtained by substituting perfectly good Open Source equivalents for proprietary and expensive software such as Microsoft Windows Vista and Microsoft Office. I am talking about Linux, Firefox, OpenOffice and countless other cooperatively-developed software products. Not only are the Open Source products cheaper, they are more open and standards-compliant, which means they play nicer with software from other sources. Best of all, they run on less powerful hardware and require fewer hardware upgrades, reducing the amount of money that needs to be spent just to get the infrastructure up and running. Ongoing running costs are also likely to be cheaper. And that's just the hard (dollar) benefits.

An exposure to Open Source software and an "educational commons" will also help to create a new generation of technology- and community-savvy students who can more effectively participate in an increasingly collaborative world. Our educational investment is not in computers, after all, but in intellectual assets - our children. It's the Open Source way that our children need to learn in order to adapt and survive in tomorrow's world, and your government's policies can make that a reality.

I know that these revolutionary approaches and solutions will be bitterly resisted by the technology establishment, but if the election of your government is not a vote for change, what is?

As a voter, taxpayer and concerned citizen, I would like Australia to get the best for its investments in technology and education, and I believe the Open Source way is the best way forward for this country.

With my very best wishes for your term as Prime Minister,

Yours sincerely,
Ganesh Prasad

Wednesday, November 21, 2007

Why RPC is Evil, XML-RPC is Doubly Evil and SOAP-RPC is Triply Evil

After reading my own review of Yuli Vasiliev's book, I realised I had to better explain my "SOAP-RPC is evil" comment.

Actually, SOAP-RPC is not just simply evil but triply evil, and here's why.

The root of the whole evil was the original arrogant assumption behind the notion of distributed objects - that we can make remote objects appear as if they are local. It was only much later that Martin Fowler came out with his First Law of Distributed Objects ("Don't Distribute Your Objects").

Quick question: How can we make a remote object appear to be local?
Correct answer: We can't, period.
Naive answer: Serialise the remote object, transport it over the wire and deserialise it to create a local copy.

The reason the naive answer is so badly wrong is that copies behave very differently to the original objects. To be precise, when a copy is changed, the original is not changed. So "hiding" the remoteness of an object through some object serialisation sleight-of-hand doesn't work. The component or application that gets a reference to the copy must know that it is a copy, otherwise all kinds of application errors can occur.

So that's why RPC (Remote Procedure Call) is evil. It tries to make remote objects look local. If one is not careful, this can create serious errors in applications.

The other issue is the serialisation mechanism. Even assuming it's OK to serialise an object and recreate it elsewhere from the serialised representation, what is the mechanism used for serialisation? Java serialisation works because we have Java on both ends. In recent times, XML has become popular, and some bright spark must have thought of "XML serialisation". The concept is simple. Convert a Java object to XML format, transport it over the wire to a remote host, then convert the XML document back into a Java object to create a perfect copy, a "remote object". The problem with this assumption is that there is an impedance mismatch between Java and XML. We can do things with Java that we can't do with XML, and vice-versa.

So converting a Java object into XML isn't straightforward. Things get lost in the translation. Converting from XML back into Java isn't straightforward either. More things get lost in the translation. So this "serialisation/deserialisation" using XML as the transport format doesn't result in perfect copies at the other end. This is XML-RPC, and when it's used naively, it's easy to see why it's doubly evil.

As SOAP-RPC evolved beyond XML-RPC, the wire protocol itself began to be seen as a "contract" between systems. In other words, the SOAP message could be used to decouple implementations at either end. All of a sudden, the XML document going over the wire was not expected to be just a Java serialisation mechanism, it was a technology-neutral serialisation mechanism. So now it became fashionable to think of serialising a Java object into an XML document, wrapping it in a SOAP envelope and "exposing" this as a contract to another system. This other system could pull in the SOAP object, extract the XML document from within it, then deserialise it into...a C# object! Not only is the impedance mismatch alive and well here too, there is another subtle assumption made that is violated in the execution. Can you find it?

Let me sum up.

RPC is evil because it makes developers think they can make a remote object appear local.

XML-RPC is doubly evil because it makes developers think (i) they can make a remote object appear local and (ii) it is possible to convert a Java object to XML and back without error or ambiguity.

SOAP-RPC is triply evil because it makes developers think (i) they can make a remote object appear local, (ii) it is possible to convert a Java (or C#) object to XML and back without error or ambiguity and (iii) even though the SOAP message travelling between two systems is now a "contract" to be honoured regardless of changes to implementations, it can still be generated from implementation classes, and implementation classes can be generated from it.

What's the truth, then?

Nothing glamorous or counter-intuitive.

1. A copy is a copy, not the original.
2. XML is XML and Java is Java (and C# is C#).
3. Contracts are First Class Entities, just like Domain Objects in an OO system.

SOAP-based Web Services technology today is sadder and wiser because it has internalised these truths. Because of Truth 1, we don't pretend anymore that a SOAP message is a Remote Procedure Call. We know it's just a message. Because of Truths 2 and 3, we don't believe anymore that we can generate XML documents from Java/C# objects, or Java/C# objects from XML documents. We can only map data between these two forms. If you build your Web Services abiding by these principles, you can stay out of trouble, otherwise you will be left wondering why your systems are so brittle.

(Actually, there are still people who don't understand these Truths, and who still go on about SOAP-RPC. That's what makes me despair about SOAP's chances for success.)

Tuesday, November 20, 2007

Book Review: "SOA and WS-BPEL" by Yuli Vasiliev

I received a SOA book for review today. It’s called “SOA and WS-BPEL” by Yuli Vasiliev. The publisher is Packt Publishing.

I’ve only had time to skim through the book once, but I think I’ve understood enough to be able to post some initial thoughts on it. As I read it in greater detail, I may refine this post into a proper review.

What I like about the book:

The choice of PHP as an implementation language: This is a refreshing reminder that SOA is about hiding service implementation details from service consumers. The implementation language doesn’t always have to be Java or C#. What matters is what the service consumer sees (i.e., SOAP). There is however, a downside with PHP, which I'll cover in a moment.

The use of Open Source technologies: I believe that Open Source is the way of the future, and the use of Apache/PHP, Tomcat and ActiveBPEL to illustrate SOA concepts feels just right. Besides, readers can readily try out the examples in the book without having to buy expensive commercial software. (An exception is Oracle, which the author discusses in some detail in the context of data-centric services. Perhaps it’s the Oracle bias he has by virtue of having worked extensively with Oracle technology in the past ;-).

Copious examples: This is not a theoretical book, and there is plenty here for the reader to try out for themselves. Every concept that the author deals with is represented in code. [I can’t comment on how complete or correct the examples are and whether they work, because I haven’t tried them out yet.]

Lots of diagrams: Pictures are really worth thousands of words, and there are diagrams sprinkled liberally throughout the book to illustrate almost every concept discussed. I thought they were quite decent.

Emphasis on data: One of my pet peeves with many people’s approach to SOA is their relative neglect of the Data Interchange view of service interactions. SOAP-based web services is about exchanging XML documents that represent some structured data relating to the operation being performed. There is a lot of design that needs to go into these XML documents. Thankfully, the author spends a fair amount of time showing how to design the data payload of messages with XML schema and converting data into XML format (but a nit about that bit later!) I also liked the treatment of the data within the contract (importing the schema file into the WSDL file instead of defining the schema in place).

Introduction to WS-Security and to Qualities of Service (QoS): The book has a section on implementing secure messaging using WS-Security, and also makes the point that virtually all the WS-* specifications use SOAP headers to implement functionality. There’s not too much here, but enough to get the developer to understand the WS-* approach.

View of a process as a service in its turn: One of the value propositions of SOA is its ability to exhibit a “flat” landscape of services, regardless of how they were implemented. From a service consumer’s point of view, it doesn’t matter if a service was purpose-built in a programming language (e.g., PHP) or stitched together out of other services (using BPEL). Services of both types should look the same. The book shows how composite services can also be exposed as services in their turn, with the appropriate WSDL sections highlighted.

WS-BPEL treatment at the right level of detail: I thought the level of discussion and the examples of WS-BPEL were just right for a beginner. There is enough detail to be meaningful, but not so much as to overwhelm.

What I don’t understand or don’t like in the book:

The title: SOA and WS-BPEL are like fruit and oranges, not even apples and oranges. The first is an architectural approach; the second is a language used to implement processes. Considering that the book deals with building standalone web services in the first part, then composing them into processes in the second, perhaps it should have been called “SOAP and WS-BPEL” or “SOA: SOAP and WS-BPEL”.

The unquestioning acceptance of the RPC view of Web Services: I have religious feelings about RPC. It is the devil’s spawn. Many of the REST camp’s arguments about SOAP are actually directed against SOAP-RPC. The modern view of SOAP-based Web Services is based on messaging. Messaging, not RPC.

What’s the difference, and why is this important? RPC is architecturally dishonest. It is impossible to make a remote object behave like a local one, and I don’t mean the effect of network latency. A reference to a local object that is passed to an application carries with it the promise that any change made by the application using the reference will change the object. But with RPC, what is passed to the application is not a reference to the object, but a reference to a copy of the object. This is not an insignificant difference. When the application makes a modification using the reference, the local copy is changed, not the remote object. But the application thinks the actual object has been changed. That’s what is so dishonest about it.

Messaging turns this essentially hopeless exercise around. It makes local objects look remote, by always passing copies around, even if the actual object is accessible by a reference. The application is under no illusion. It knows that in order to make a change to the real object, it is not enough to make a change to the copy. Either the copy must be passed back to be synchronised in some sense with the original, or an independent operation to pass a Data Transfer Object is required. This is architecturally honest and clean. What it may lose in efficiency in some corner cases (local access), it more than regains in terms of robustness, flexibility and scalability.

That’s what SOAP messaging brings to the table. SOAP-RPC is evil and should have nothing to do with a book on modern Web Services. It’s a pity the author actually mentioned RPC by name when introducing SOAP messaging, because the actual examples do not assume RPC.

In this context, the use of PHP has a downside, as I indicated earlier. PHP is heavily tied to HTTP and by extension, to synchronous request/response semantics. SOAP-based Web Services technology does not inherently have this constraint and can work with asynchronous transports as well. The book could have illustrated this effectively using a message queue example. There are tools such as PHPMQ and Mantaray that make this possible, as this example shows.

Automatic data transformation to and from XML: This is another of my pet peeves. The service contract (of which the XML document forms a part) is a First Class Entity. So are the classes that make up the internal Domain Model. How can one ever be generated from the other using a tool? Code generation is an example of tight coupling, and if two First Class Entities are tightly coupled, then at least one of them is not a First Class Entity. QED. I’m not sure if there is an equivalent to TopLink or JiBX in the PHP world, but such a mapping tool is what is required to transform data between the PHP and XML worlds. To be fair, this book is not alone in propagating the code generation approach. Virtually the entire Java Web Services industry is consumed by JAXB disease.

Data-centric Web Services – Actually, I didn’t understand the point of this as a separate topic. It’s a special case of service implementation. In fact, I have a totally different view of Data-Centric Web Services. I call them REST.

No high-level view of process as an aspect of the business: For a book that purports to be on SOA, the treatment of composite services and processes is surprisingly low-level and technology-oriented. There should have been an introduction that focused on business processes and their decomposition into services. In fact, SOA best practice is all about business process modelling and re-engineering. That’s how architects and business analysts determine the services that are required and their granularity. Proceeding bottom-up from services and composing them into processes, as the book seems to suggest is the way to do SOA, is ingenuous.

Anaemic index: The index of the book doesn't list many of the things discussed inside. I tried going back a couple of times to look up something I had seen earlier, but the index was of no help.

Other comments:

ActiveBPEL Designer is not an Open Source product, merely free. This isn’t the fault of the author. It’s just something I’m personally sad about. I haven’t yet found a truly Open Source BPEL designer that is powerful and friendly, and generates full-featured BPEL.

Overall comments:

Actually, notwithstanding the negative comments I made (I'm a nitpicker, as my wife will attest), this is a pretty decent book on Web Services (SOAP and WS-BPEL). It’s got enough low-level detail to help developers get their hands dirty and understand the technology by actually building services and processes. The choice of PHP could turn out to be a masterstroke by reaching beyond Java or C# developers and appealing to the vastly more populous LAMP community. Time will tell. I thought the book was a bit light on architectural insight, but maybe it’s for the best. For a developer audience, such discussion might just cause eyes to glaze over.

Tuesday, November 13, 2007

SOFEA and SOUI - There is a Difference, After All

Great minds think alike, or so it would appear :-).

Just a month after we published our article "Life above the Service Tier" describing SOFEA (Service-Oriented Front-End Architecture), Nolan Wright and Jeff Haynie have proposed an architecture they call SOUI (Service-Oriented User Interface). (Independently, Roger Voss blogged that web frameworks are peaking towards obsolescence.)

It's worth repeating what we said in the Conclusion section of our paper:

"Although it seems presumptuous on our part to claim that we have “solved” the end-to-end integration problem, what is probably true is that recent paradigms and technology breakthroughs have brought the SOFEA model closer to conceptualisation, and someone or the other was bound to suggest it. It happened to be us."

Well, clearly not just us. Many others are saying the same thing.

Matt Raible wondered if there was any difference between the two frameworks (SOFEA and SOUI). SOUI proponents Nolan and Jeff haven't just proposed a theoretical architecture, they've actually created a set of tools to help developers build to this model, and what's more, this set of tools is available for Java, PHP, Ruby and .NET. It's called Appcelerator, and it's a pretty impressive piece of work.

Examining Appcelerator (and it would be a fair assumption that this is what Nolan and Jeff intend SOUI to look like), it appears to satisfy all the conditions that we proposed for SOFEA, except one.

SOFEA emphasises the use of XML for Data Interchange, because one of its guiding principles is to mesh seamlessly with the service tier. Services may be built using either the REST paradigm or the SOAP messaging paradigm, but XML plays a big role in either one. SOAP requires, and REST recommends, the use of XML for Data Interchange. XML has the essential characteristic of being able to enforce data integrity (i.e., conformance to specified data structures, data types and data constraints). That's why we make a big deal about XML support in SOFEA. We considered but rejected JSON because it's only slightly better than raw HTML-over-HTTP in these respects.

Appcelerator uses JSON, not XML. So that seems to be the big difference between the two models. SOFEA requires the use of XML for Data Interchange. SOUI prefers the more lightweight JSON.

I guess Nolan and Jeff have made a valid architectural choice favouring ease-of-use over data integrity, but better XML tooling may blunt that advantage in future.

Incidentally, Ye Zhang made a comment about SOFEA on Matt Raible's blog to the effect that we seemed to be chasing buzzwords and hence tacked on the "Service-Oriented" prefix to our model. Not true. Service-Orientation is at the heart of our model, because we approached the Presentation Tier from that angle. Our emphasis on XML as the Data Interchange mechanism for SOFEA was designed to enable the Presentation Tier to interoperate with the Service Tier with no impedance mismatch at all.

At any rate, the software industry seems to be at the start of a new era, and I'm happy we were able to make a contribution to the debate.

Tuesday, October 23, 2007

SOA Technology Best Practice

Let me formalise what's on my mind regarding SOA best practice.

Build your core application using the principles of Domain-Driven Design. This is your service implementation. Hint: If using Java, Spring/JPA is the way to go. EJBsaurus is dead.
Take the help of an off-the-shelf domain model like IFW for Banking and IAA for Insurance. This will let you build incrementally without experiencing refactoring pain as your model grows.
If the way your clients refer to your services emphasises verbs over nouns ("Jelly this eel"), you may want to look at SOAP-based Web Services, which are operation-oriented services. If your clients emphasise nouns over verbs ("Buy a ticket to the Spice Girls concert"), consider REST-based Web Services, which are resource-oriented services.
Model your interface contract using XML schema in either case. Layer SOAP or REST Data Interchange patterns over this basic XML document foundation.
Follow Contract-First Design. Do not generate your service interface from your implementation classes, even if your vendor gives you cool tools to do so painlessly (Hint: the pain comes later, when every change to your implementation breaks your service contract).
Use a meet-in-the-middle mapping approach to connect the XML documents in the contract to the domain model. Do not generate implementation classes from the service interface. The self-important JAXB library is a bad bug that's going around. Use JiBX or TopLink instead.
When building the Presentation Tier of an application that consumes services, use the SOFEA model.
If you want to do content aggregation using 2007 thinking, consider a mashup. If still stuck in 2003, use a portal server and jump through the associated hoops.
You don't need an ESB product, not now, not ever. Learn to see the "bus" in the way services are built and invoked. Remember that the web with its smart endpoints has comprehensively beaten the telco network with its smart network approach, even in voice communications. If you must use an ESB, deploy it in a decentralised, "smart endpoints" configuration. Obviously, a decentralised ESB is economically viable only with an Open Source product . A commercial one will force you to deploy it in a centralised architecture to save on licence costs (blech!).
You don't need an ESB product to do process orchestration either. Use BPEL engines at endpoints. If you need human workflow, use BPEL4People.
Open Source is the way of the future. Wake up and smell the coffee. For SOAP-based Web Services, consider Sun's Metro (formerly Tango) SOAP stack. For REST, consider Sun's Jersey (Java-based) or SnapLogic (Python-based). For an ESB (decentralised, of course), consider ServiceMix/CeltiXFire/FUSE. For process orchestration, including human workflow, check out Intalio.
Management of your services can be centralised even with decentralised services. That's not an argument for a centralised ESB either.
Take vendor education with a pinch of salt. Talk to fellow practitioners, especially the bold and the irreverent.

Thursday, October 18, 2007

A Java-filled Day in Sydney

Wednesday the 17th October, 2007.

Sydney had a surfeit of Java today. First, there was the all-day Sun Developer Day at the Wesley Theatre on Pitt Street. And then after that, the Sydney Java Users Group had a meeting at the same place where Mike Keith (of Oracle and the EJB3 spec committee) spoke about JPA.

Where do I begin? I guess my takeaways were that we shouldn't write off NetBeans or Glassfish yet. There was a time, I admit, when I was one of those who thought Sun should just give up on building its own IDE and app server. I may have been wrong. The upcoming NetBeans 6.0 seems quite cool, not least because they seem to have shamelessly stolen a lot of IntelliJ IDEA's cool features.

Glassfish doesn't seem bad either. It's no longer a toy. It's clusterable, for one. And Sun's Project Tango (interoperability with Microsoft's implementation of Web Services in .NET 3) is now part of Glassfish. That means the Java world gets an advanced Web Services platform absolutely free. That should count for something. And Glassfish v3 is supposed to start up in under a second, so it should really wow people when it debuts.

There was a great talk on building RESTful Web Services by Sun's Lee Chuk Munn, which has inspired me to try and download Jersey and give it a spin. Should I look at Phobos as well - a JavaScript engine for a RESTful server? Decisions, decisions.

Angela Caicedo was another Sun evangelist who spoke very well about new features in Java 6 and 7, also about building mobile applications using Java, AJAX and SVG. I'm just a bit disappointed that a mechanism I thought of but didn't widely publicise is now being used for exposing Web Services. The Java 5 concepts of Responses and Futures as a way to turn nouns into verbs and get systems to turn what should be standalone "methods" into classes (which are naturally first-class objects) is at the heart of this.

Glassfish is built on top of the java.nio.* package, which makes it very fast, I believe.

In the evening, I attended Mike Keith's talk on JPA. I'm pleasantly surprised to see that I'm not the only one advocating a mapped-data binding approach. Mike told us about Eclipse MOXy, which provides a "meet-in-the-middle" approach. I wonder if they're aware of JiBX.

Wednesday, October 10, 2007

Services, Persistence and a Rich Domain Model

I see so many wrong-headed ideas about SOA and Web Services that I want to scream.

Here are just some:

1. Using the RPC style of SOAP Web Services (Wake up, Rip Van Winkle, it's 2007!)
2. Thinking that buying an Integration Broker or ESB will itself solve your integration problems (Sure, your shiny telephone lets you dial any international phone number, but does the person at the other end speak your language?)
3. Generating WSDL files and Web Service endpoints automatically from Java classes (So what happens to your service contract when you next change some aspect of your implementation?)

It's the last of these points that I want to talk about in this post.

On the face of it, the problem can be stated in a deceptively simple manner, but there are significant subtleties to it.

The application you are building uses (say) Java (it could be using C# and .NET for all we care). The way we have decided to interact with other systems is through Web Services. That means SOAP messages will be exchanged between our systems. Since the document/literal style of SOAP-based messaging requires an XML document to be embedded within the SOAP body, we can see that the service contract is basically a set of XML documents.

So our problem can be simplistically stated as follows:
How do we convert between Java objects used within our system and XML documents exchanged with other systems?

There are two "obvious" answers to this question - either generate the XML document(s) from your Java classes, or generate Java classes from the XML documents. Java IDEs tend to take the former approach. In fact, many of them go a step further by not only generating XML documents but SOAP endpoints as well, and WSDL definitions of the SOAP endpoints that you can conveniently distribute to your client systems. As I suggested earlier, this approach is wrong because it breaks the service contract (by generating a new WSDL file) whenever the Java classes or method signatures change.

There are also a bunch of tools that take the latter approach. Apache's ADB, XMLBeans and the ubiquitous JAXB library that comes bundled with your JDK all try and generate their versions of a set of Java classes that they believe represent your XML document. Looking at the resultant code can make you lose your appetite very quickly. I'm surprised how so many people gamely continue even after looking at the results of what is very patently a wrong approach.

(This is a blog. Bloggers are meant to be opinionated.)

I believe I know how the problem should really be tackled. Both answers above are wrong. Neither the XML document nor the Java classes must be generated from the other. The Service Contract (represented by the XML document) and the Domain Model (represented by the Java classes) are both First Class Entities. By this, I mean that both are concepts that stand independently. The Service Contract is something that is negotiated between service provider and consumer. It must have no implied dependencies upon how the service is implemented. The Domain Model, on the other hand, represents the designer's deep understanding of the business application and how it really works. This should be independent of how other systems may want to interact with it.

And so my answer to the problem is that we need a third entity, a mapping entity, that sits outside both the Service Contract and the Domain Model and maps them to each other in as flexible and bi-directional a manner as possible. We have such tools today. Their names are Castor and JiBX.

The tragedy of the Java-XML translation space today is not that most practitioners seem to think it is a solved problem. It is. The tragedy is that they think the solution is JAXB. The correct answer is JiBX, and they get no points for getting the letters a bit jumbled. (Think mapping, not code generation.)

Mapped data binding that respects the independence of the two entities it maps is the correct solution approach to Java-XML translation.

There is a second reason why mapped data binding works so well, and it has to do with what is called an impedance mismatch. Impedance mismatch is a term borrowed from Electrical Engineering, and refers to the difficulty of translating concepts from one paradigm to another. Classes in Object-Oriented systems and tables in Relational Databases have an impedance mismatch. Classes and XML schemas have an impedance mismatch.

The Object-Relational impedance mismatch has been satisfactorily overcome for a few years now by ORM (Object-Relational Mapping) tools. Hibernate, Toplink JPA and OpenJPA (formerly Kodo) come to mind. You will notice a similar approach with these tools to what I suggested was the solution to Java-XML translation. They respect both paradigms and can map existing instances of each using mapping files that sit outside both.

So overcoming the Java-XML impedence mismatch is a second good reason to use a mapped data binding approach like JiBX or Castor.

[I'm told that JiBX makes the use of XML as a data interchange mechanism as fast as native RMI. I just love it when the right way to do something is also made attractive.]

My third and final way of looking at the problem is from the traditional Java view of Domain Objects and Data Transfer Objects. Data Transfer Objects have never been considered true objects because they only encapsulate state, not behaviour. However, they serve a useful purpose as a data interchange mechanism. It's considered bad practice (except by the Hibernate crowd and their queer notion of detached entities) to directly expose your Domain Objects to the outside world. You should decouple your implementation from your service interface. This decoupling is done by using Data Transfer Objects (DTOs) as very specialised data structures visible to other systems through the service interface. DTOs are marshalled from Domain Objects and unmarshalled back to Domain Objects during service interactions. This is a paradigm well known to most J2EE developers. Looked at in this way, XML documents used in Web Services are just DTOs.

The best technology to marshal and unmarshal Java DTOs is Dozer, just as the best technology to marshal and unmarshal XML DTOs is JiBX.

I see a beautiful symmetry in the way all these tools and technologies inter-relate. At the centre is the rich Domain Model that implements your application. To persist the application's state, you use Hibernate or another ORM tool to map your Domain Objects to relational tables. To talk to remote Java clients, you map your Domain Objects to DTOs using Dozer. To expose services to clients that require a Web Services interface, you use JiBX or Castor to map your Domain Objects to XML documents.

This picture should be worth the thousand words above.

Sunday, October 07, 2007

Who is the best Vendor of Collaboration Technology?

Collaboration is a hot topic in the industry today, and every organisation wants to set up an advanced collaboration capability for their employees and customers. My personal view is that collaboration has more to do with culture than technology. So if an organisation's culture is not inherently collaborative, merely buying collaboration technology will leave a hole in their budgets but not actually achieve very much.

But catch vendors telling customers that.

Listening to them tell it, the key to success in collaboration capability seems to be in implementing the most integrated "stack" of products you can find. The theory goes that if you find a vendor with a rich and integrated stack of "collaboration" products, buying and deploying that stack will lead to collaboration nirvana (canned laughter here).

It's funny enough seeing stodgy, hierarchical organisations trying to acquire collaboration capability without becoming any less stodgy or hierarchical, but just look at who the vendors are!

You shouldn't be surprised to find that the "leaders" in the collaboration space (self-styled or anointed as such by analyst firms with backward-facing crystal balls) are large, stodgy and hierarchical software vendors. Our usual suspects IBM and Microsoft lead the pack. Blind leading the blind is the expression that leaps to mind.

Face it, buying collaboration technology from large software vendors is like importing voting machines from dictatorships. What do these guys know about this stuff anyway? They wouldn't recognise it if it bit them in a sensitive area. I bet they wouldn't be prepared to face its implications in their own organisations/countries, but they don't seem to mind pushing it onto others. (OK, that's a bit unfair, because many technical people in companies like IBM and Microsoft do collaborate quite effectively, but I'm talking about a larger organisation-wide culture.)

I believe that organisations that want to make a success of collaboration must (1) ensure first that their cultures are collaborative and (2) source their collaboration technology from collaborative communities, e.g., the Open Source community.

Collaboration is what Open Source communities do all the time. Newsgroups and mailing lists, IRC, blogs and wikis, RSS and Atom, mashups, -- all invented by the necessity of collaboration across far-flung communities of peers. These products and technologies are truly collaborative because the people that built them use them for true collaboration, not the sanitised, managed and controlled collaboration that corporate bosses have in mind.

And the "stack" theory is my pet hate. In my evolved view (and I say this without vanity), I believe that the best integrated products are those that were not built as a "family of products" by a single entity but those that were built by independent groups that operated in a decoupled way but had no hidden agendas about locking out their competitors. That's why you can run a Linux desktop with a Firefox browser, a MySQL database and your own local Apache web server running PHP, and build applications using this suite of products without having to think about where they came from. The Linux community, the Mozilla Foundation, the Apache Software Foundation, the PHP community, the MySQL community and MySQL AB have all collaborated (yes, that's the word) to deliver to you, the developer, a seamlessly integrated set of products. Is it a monolithic "stack"? Nonsense. The myth that software products need a common brand to be interoperable is just that - a myth. Open Source software groups don't have ulterior objectives of locking each other out. That's why interoperability happens, - because users want it.

Will decision makers get this? I believe it's an ongoing process of learning and maturity. Hopefully the presence of Generation Y in the workforce will teach the rest of us how to really collaborate. And put away those chequebooks, because the best things in life really are free.

Life above the Service Tier

Or "How to Build Application Front-ends in a Service-Oriented World"

(This is a paper written by myself with two co-authors, Rajat Taneja and Vikrant Todankar, the same team that critiqued the EJB 3.0 spec in 2004.

You can download and read the paper and an accompanying presentation.

Update Sep 2008: How SOFEA fits into a SOA Ecosystem)

But let me talk a bit about it here.

When we look at the Business Logic Tier, although the industry hasn't reached nirvana yet, the story seems to be coming together rather well. The Service-Oriented Architecture (SOA) message is being heard, with its concepts of loose coupling between systems, removal of implicit dependencies and specification of all legitimate dependencies as explicit contracts. The SOA model yields flexibility and uniformity through the specification of a definite Service Interface, which hides domain models and implementation technologies.

Unfortunately, no such clear architectural blueprint exists for the Presentation Tier that lies just above the Service Interface. There are a dozen different technologies available here. There are also the thin-client and rich-client models to choose from. Which of these is the "right" one to use? What does industry best practice say about developing the front-end of an application? What is the best way to connect the front-end neatly into the Service Interface? Developers are confused and left without guidance.

And while on the subject of connecting the front-end to the Service tier, an additional source of confusion is the fact that the Service Tier may itself follow one of two different models - SOAP-based Web Services or REST. Which one should we target? We believe that both these models will continue to co-exist, and organisations will have legacy business services in both these models. Therefore a good Presentation Tier technology will be capable of interfacing with both SOAP and REST, not just one of them. It is important to note that both SOAP and REST use (or are capable of using) XML documents for data interchange.

When we put on our architects' hats and look at the Presentation Tier, we see that regardless of whether it is a thin client or a rich client, three essential processes always take place.

The first is what we call Application Download. In order to be able to run an application on a client device, it needs to be delivered to the device in some fashion.

When the application is in use, a sequence of screens is typically seen. We call this Presentation Flow.

Finally, any non-trivial application ultimately exchanges data with some back-end resource, a process we call Data Interchange. This is usually the crux of the application's functionality.

To reiterate, the three processes occurring in the Presentation Tier of any application are Application Download (AD), Presentation Flow (PF) and Data Interchange (DI).

In thin-client applications, the web server plays a part in all three processes. Application Download occurs piecemeal, in the form of individual web pages served up by the web server. The web server also drives the application's Presentation Flow, usually with the help of a web framework like Struts or Spring MVC. Finally, the web server acts as an intermediary for Data Interchange between browser and application server.

We believe that the thin-client model as described above suffers from at least three major architectural flaws. Unless these flaws are addressed, we cannot arrive at a clean architectural blueprint for the Presentation Tier.

The first flaw is that the thin-client model does not respect data. Look at the data that goes from the browser to the web server. There is no data structure. It's just a set of name-value pairs. There are no data types. Everything is a string. And there are no data constraints. We can put any values into those name-value pairs.

Look at the data that comes back from the web server to the browser. It's highly presentation-oriented, marked-up data. The thin-client model does not respect data as data. We believe we can do much better than this in this day and age.

The second flaw is that the thin-client model tightly couples the mutually orthogonal processes of Presentation Flow and Data Interchange. It is not possible to move through the steps of the Presentation Flow without initiating what amounts to Data Interchange operations. Web pages are displayed in response to GET and POST requests sent by the browser. Even worse, every Data Interchange operation initiated by the browser willy-nilly forces a Presentation Flow. An infamous result of this tight coupling is the "browser back-button problem". When a user tries to step backwards page-by-page through the Presentation Flow, the browser re-does the Data Interchange operations that resulted in each of those pages. If any of those operations is a POST, it spells trouble, because POST operations are not safe or "idempotent". They have side-effects if re-done.

True, there are patterns such as POST-Redirect-GET that are used to work around this problem, but the inescapable fact is that the fundamental architecture is broken because of this tight coupling.

The third flaw is that the web model is request/response. It does not support peer-to-peer interaction styles that are required for server event notification.

At this juncture, one would be tempted to conclude that AJAX is the answer to these problems. Unfortunately, AJAX itself is just a raw capability and not a prescriptive model. It is possible to use AJAX and still come up with a horrible hybrid model where the web server continues to drive Presentation Flow in response to Data Interchange operations, and the AJAX interaction just hangs off to one side, so to speak.

Clearly, what we need is a prescriptive architectural model that explicitly delineates the dos and don'ts of application design.

We call this model SOFEA (Service-Oriented Front-End Architecture).

In the SOFEA model, there is an entity called a Download Server. This role may be performed by a standard web server, but its role is very strictly restricted to enabling Application Download. It stays out of the loop thereafter and does not play a role in Presentation Flow or Data Interchange. This is the most significant difference between the SOFEA model and the traditional thin-client model.

The application is downloaded onto an Application Container, which could be a browser, a Java Virtual Machine, a Flash runtime or even the native operating system itself. The nature of the Application Container does not matter. What matters is that there is an environment within which the application can run.

The application is built using a proper MVC architecture, not the Front Controller pattern used by traditional thin-client applications. The Controller drives Presentation Flow.

The Controller also asynchronously initiates Data Interchange with the Service Interface as required.

We stipulate that a SOFEA-conforming application will support both SOAP and REST-based Data Interchange through XML documents. This is the seamless interface that we envisage between the Presentation Tier and the Service Tier. These XML documents, being specified through a schema definition language, will enforce data structures, data types and data constraints at the point of capture and ensure data integrity end-to-end. The XML documents of the service interface correspond to a representation of the domain, or (to a Java programmer) Data Transfer Objects (DTOs). They're ideal for transferring data marshalled from domain objects or unmarshalled into domain objects. Thus, while XML documents are a connection point between the Presentation and Service Tiers, they also decouple the client from the domain model, which is as it should be.

For maximum flexibility, Data Interchange should support Peer-to-Peer interaction. A strict client/server delineation as in the case of traditional thin-clients is likely to be limiting.

The SOFEA model unifies the "thin" and "rich" client models. Indeed, these labels are meaningless when systems conform to the SOFEA model. The differences between technologies form a spectrum based on the application's footprint and its startup time. There is no sharp demarcation between thin and rich clients anymore.

The SOFEA model also highlights why there is a plethora of web frameworks. SOFEA explicitly repudiates Front Controller as an anti-pattern. Driving Presentation Flow from the web server is tied to a fundamental architectural flaw (the coupling to Data Interchange). Therefore no web framework can ever be satisfactory. Continued innovation in search of a "better" web framework is, we believe, completely misguided.

The SOFEA model can be implemented through any of the following modern technologies:

I DHTML/AJAX frameworks for Current Browsers

1. Handcoded with third party JavaScript libraries
2. Google Web Toolkit (GWT, GWT-Ext)
3. TIBCO General Interface Builder

II XML Dialects for Advanced Browsers

4. XForms and XHTML 2.0
5. Mozilla XUL
6. Microsoft SilverLight/XAML

III Java frameworks

7. Java WebStart (with/without Spring Rich Client)
8. JavaFX

IV Adobe Flash-based frameworks

9. Adobe Flex
10. OpenLaszlo

Obviously, different technologies will support, out of the box, different subsets of the SOFEA model. It's up to designers to consciously apply SOFEA principles to their application design, making the appropriate decisions with regard to their technology of choice.

Our contribution through the SOFEA model is the following:

A renewed emphasis on data integrity, which traditional thin-client technology does not and cannot enforce.
A cleaner architectural model that decouples the orthogonal concerns of Presentation Flow and Data Interchange.
Affirmation of MVC rather than Front Controller as the natural pattern to control Presentation Flow.
Unification of the thin-client and rich-client models, now seen as an artificial distinction.
Support for SOAP- and REST-based business services, and a natural integration point between the Presentation and Service Tiers.
Positioning the web server as a Download Server alone. The evils of web server involvement in Presentation Flow and Data Interchange are avoided.

We believe we have provided an architectural blueprint for the Presentation Tier, with clear dos and don'ts, while still supporting considerable diversity in technologies. Life above the Service Tier now lives by similar rules as life within it.

Tell us what you think.