Thursday, December 27, 2007

Why RPC is Evil, Redux

Why am I blogging about this again? I talked about RPC being evil before.

Well, Mark Little referred to my post on Paying the RESTafarians Back in Their Own Coin and went into some detail on RPC and distributed systems in general. I'm sure that Mark (having a PhD in Distributed Systems) understands the fundamental issue with RPC, but since he doesn't make the point obviously enough in his post, it's possible that some of his readers may not get it. So let me illustrate my point with an actual example.

To recap: RPC is evil because it tries to make remote objects look local, and the illusion fails subtly in certain use cases that are quite common, damaging the integrity of the applications involved. When I say "object" here, I am not referring to objects in the OO sense, but anything that exists in memory, even an integer.

When I first learned C programming (in 1986 - man, has it been 21 years already?), I learned that swapping two numbers with a function call wasn't so straightforward. The following naive example doesn't work:

main()
{
int a = 5;
int b = 6;

printf( "Before swapping, a = %d and b = %d\n", a, b );

swap( a, b );

printf( "After swapping, a = %d and b = %d\n", a, b );
}

void swap( int x, int y )
{
int temp = x;
x = y;
y = temp;
}

The printout of the program will be:

Before swapping, a = 5 and b = 6
After swapping, a = 5 and b = 6

Clearly, the naive approach doesn't work, because C functions are called with pass-by-value semantics. This means that the swap() function is given copies of the two variables a and b. What it swaps are the copies. The original variables a and b in the calling function are unchanged.

The correct way to do the swap in C, it turns out, is to pass the memory addresses of the two variables to the swap() function, which then reaches back to swap the two variables by dereferencing the addresses:

main()
{
int a = 5;
int b = 6;

printf( "Before swapping, a = %d and b = %d\n", a, b );

swap( &a, &b ); // Note: we're passing the addresses of the variables now

printf( "After swapping, a = %d and b = %d\n", a, b );
}

void swap( int *x_p, int *y_p ) // The function receives the addresses of the variables, not the variables themselves
{
int temp = *x_p; // The asterisk helps to dereference the address and get at the value held at that location
*x_p = *y_p;
*y_p = temp;
}

Now the printout says:

Before swapping, a = 5 and b = 6
After swapping, a = 6 and b = 5

Subtle point: Note that pass-by-value semantics are alive and well, but now we're passing copies of the addresses. That doesn't matter, because when we dereference the addresses, we find we're looking at the original variables (multiple copies of an address will all point to the same memory location).

Now imagine that someone comes in offering to take my swap() function and run it on a remote host. They claim that they can make the two functions (the calling main() function and the swap() function) appear local to each other, so that I don't have to change either of them in any way. Is this claim credible?

This is exactly the promise of RPC, and if you believe the promise, you're in for a great deal of trouble.

What RPC will do is place a "stub" function on the machine running main(), and a "skeleton" on the machine running swap(). To main(), the stub function will appear like the original swap() function, and to swap(), the skeleton will behave like the original main() function. Between the two of them, the stub and skeleton will attempt the hide the presence of the network between main() and swap(). How do they do this?

The stub will quietly dereference the memory addresses being passed by main() and extract the values 5 and 6. Then it will send the values 5 and 6 over the network. The skeleton will create two integer variables in the remote system's memory and populate them with the values 5 and 6, then pass the addresses of these two variables to the swap() function. The swap() function will happily swap the two variables on the remote system, and the function call will return. Our printout will say:

Before swapping, a = 5 and b = 6
After swapping, a = 5 and b = 6

What happened? Didn't we pass the addresses of the variables instead of their values? Why didn't it work?

It didn't work because you cannot make pass-by-value look like pass-by-reference, and that's because memory references are meaningless outside the system they refer to. Think about this a little bit, and how we can solve the problem.

Let's say another person approaches me now, offering to host my swap() function on a remote system, but this time, they make it clear that only pass-by-value will be supported and I will have to deal with it. In this case, I need to make some changes to my main() and swap() functions. I first declare a structure to hold two integers in a separate file as shown below:

/* File commondecl.h */
// Declare structure tag for use elsewhere
struct number_pair_st_tag
{
int first;
int second;
};

I then use this common declaration to define a structure in both the main() and swap() functions:

/* File containing main() function */
#include "commondecl.h"

main()
{
int a = 5;
int b = 6;

// Define a structure to hold two variables

struct number_pair_st_tag returned_pair_st;

printf( "Before swapping, a = %d and b = %d\n", a, b );

returned_pair_st = swap( a, b ); // Call by value and receive by value

// Explicitly assign what is returned to my original variables.
a = returned_pair_st.first;
b = returned_pair_st.second;

printf( "After swapping, a = %d and b = %d\n", a, b );
}

/* File containing swap() function */
#include "commondecl.h"

struct number_pair_st_tag swap( int x, int y )
{
// Define a structure to hold the swapped pair
struct number_pair_st_tag swapped_pair_st;

// The swapping is more direct this time, no need for a temp variable.
swapped_pair_st.first = y;
swapped_pair_st.second = x;

// Explicitly return the swapped pair by value
return swapped_pair_st;
}

The printout will now read:

Before swapping, a = 5 and b = 6
After swapping, a = 6 and b = 5

The swap worked, and it worked because there was all-round honesty this time about the fact that variables would be passed by value. Yes, I had to re-engineer my code, but now my code will work regardless of whether the two functions are local to each other or remote.

A couple of points:
1. Isn't this wasteful for local calls?
2. Isn't this brittle? There's a shared file "commondecl.h" between the two systems.

Well, yes, of course it's wasteful for local calls. But seriously, how wasteful? We're talking about in-memory copying here, which is generally insignificant compared to the network latency we always incur in the case of remote communications. True, if objects are very large and the copying is done in iterative loops, the performance hit could become significant, but the fundamental issue is that correctness trumps performance. We would rather have systems that work correctly but less efficiently than systems that work incorrectly.

As for brittleness, that's what contracts are for in the SOA world. The "commondecl.h" file is part of the contract between the two systems. In fact, in the general case, we would be declaring both the passed and the returned parameters within such a contract, and we would expect both systems to honour that contract. Coupling systems to a contract rather than to each other is not brittle, it's loose coupling.

I hope these examples make it very clear why RPC in all its forms is evil incarnate. RPC is a dishonest mechanism, because it promises to achieve something that it simply cannot deliver - a convincing illusion that remote objects are local. This is one of the reasons for Martin Fowler's First Law of Distributed Objects ("Don't Distribute your Objects"). The opposite illusion, that all objects are remote (and that therefore, everything passed between them is a copy ("pass by value")) is achievable and honest, but could be marginally inefficient when two objects are local to each other.

System integrity is paramount, and therefore we can only use honest mechanisms, even if their efficiency is suboptimal. Messaging is an honest system, because it explicitly deals in copies ("pass by value"). Distributed systems should therefore be based on a messaging paradigm.

As I said at the beginning of this post, I have delved deeper into why XML-RPC is doubly evil and SOAP-RPC is triply evil here. Hopefully it should be clear now why I'm so much against SOAP-RPC and such a fan of SOAP messaging.

3 comments:

Mark Little said...

Not only do I have a PhD in fault-tolerant distributed systems, but I've also been working in the area for over 20 years ;-)

RPC is like any tool: you use it when it makes sense. And believe it or not, it makes sense quite a lot. Let's not get into a black-and-white/good-versus-evil debate over something that has been used successfully in large and small-scale distributed systems for over 2 decades. There are shades of grey here.

Mark Little said...

Oh and I'd definitely recommend you and your readers check out the literature on RPC and stub generators going back to at least 1984. Don't let CORBA, DCE or DCOM cloud your vision on what's actually possible with a good implementation and a good set of development tools. CORBA et al are a standards approach, so we had to make compromises when working with existing implementations from IBM, Oracle, BEA etc. Not all of those compromises were right. However, if you look at some of the implementations that went over and above what CORBA defined (as an example), such as Nortel's ORB or HP-ORB, there a lot less "evil" than you might think.

prasadgc said...

Mark,

I take your point that there are shades of grey. RPC has obviously proven useful to a lot of people (which is why it has been around for so long) but my point is it doesn't comply with SOA principles.

And since SOA is good...

:-) Ganesh