Hacker News new | past | comments | ask | show | jobs | submit login
Re-Reading Tanenbaum’s Critique of RPC 30 Years Later (bu.edu)
77 points by iamwil on March 28, 2022 | hide | past | favorite | 44 comments



I’m mostly lost on the discussion of RPC. This article delves into the historical and more abstract / academic ideas of RPC. By the time I started working in industry, I only know about RPC in practice rather than theory. I’ve never bothered to know the details of RPC, but the main idea ( in practice ) is one computer calling another. Servers have some identifier / address, and the RPC call involves some agreement on the message format. In other words, an API contract + some infrastructure on how to resolve computer names. If it was external, we call it an API over HTTP. Nowadays, we just call the RPC topology as microservices, which is usually an HTTP API, just not on the public internet.

I found these bits to be golden:

> There is a widespread tendency in computing to adopt more important sounding names for something than is either warranted or is just plain wrong, such as, calling the graph of a network, a topology; calling the chief engineer on a project, the architect; calling a protocol, an API; calling almost anything a new paradigm; etc. There seems to be some deep-seated insecurity in the field that feels a need to inflate the importance of the concepts we work with.

> About every decade or so, we have to recycle all of the bad ideas of the previous generation.


It seems to me that in every group of people where a certain concept is often used, that concept will get a short name.

You are not going to write out many times 'the chief engineer on a project'. So let's take a mostly unused term, like architect and stick with that. (Read 'The meaning Liff' where Douglas Adams is having fun with the reverse of this)

Same thing with RPC, you need to have a term for sending a message to a remote process asking it to execute a particular 'function' and send back the result.

At the same time, you don't want write a lot of code in each application to marshall arguments, send a message, wait for the reply, decode the result. So it seems obvious to generate those functions from a description file. And then you have something that looks a bit like a procedure call that is remote.

Of course at some point the discussion derails and people start claiming that the RPC is basis of distributed computing. And then other people start writing critiques.

That doesn't change the fact that a RPC is a nice concept, as long as you keep in mind what it really is, and what its limitations are.


> marshall arguments, send a message, wait for the reply, decode the result

In the case of Protobuf, Google considers Protobuf as merely being focused on marshaling arguments then decoding results, and is otherwise decoupled from gRPC which is the layer and software to relay 1s and 0s.


Note that protobufs actually have a "service" section to define RPC endpoints: https://developers.google.com/protocol-buffers/docs/proto#se...


> in every group of people where a certain concept is often used, that concept will get a short name.

His point is not name length, it's the inherent pompousness of the new names.

We generally lack credentials. We puff up the terms for everything we do, making them seem important and official. What would Freud say?


Is a term like 'architect' pompous? What short, existing word would you use to describe who decides how something is going to be built?

Is abbreviating "remote procedure call" to RPC pompous?

I think the words 'graph' and 'topology' are about equally as likely to make normal peoples eyes glaze over.


Software Chief


> Nowadays, we just call the RPC topology as microservices, which is usually an HTTP API, just not on the public internet.

That's not necessarily true, though there are plenty of places that do it that way. But there are a whole slew of pub/sub mechanisms and various transports (message buses, for instance) used in the implementation as alternatives to HTTP(S).


Normally an article would include a summary of the original paper. In this case it may be justified not to do that, but at least a link would have been helpful. So, first, here's this: https://www.win.tue.nl/~johanl/educ/2II45/2010/Lit/Tanenbaum...

Second, to understand the critique I think it's helpful to understand what I would perceive as the chewing on that paper by the community over time, which can be summarized as this: The distinguishing characteristic of an "RPC call", in the sense that Tanenbaum meant, is an attempt to make a network transaction have the exact same semantics as a function call. Importantly, you should read this strictly; the idea isn't that it should be "functionish", or have some syntax sugar around a network transaction that looks a lot like a function in your language, but the idea is that it is providing an abstraction that is fully as reliable as a normal function call within that language, and can, in every way, be treated as a function call. It is a "procedure call" that just happens to be remote, but the abstraction is so complete that you can stop thinking about it entirely as a programmer.

If you add that perspective to a reading of the PDF I linked, you'll probably understand the objections made more deeply.

You now live in a world where Tanenbaum "won", because he is objectively correct that such a thing is simply not possible, so it is much harder to understand what he's banging on about here in 2022. The network "RPC" calls you're used to have backed down the promises they made, and in most languages aren't as simple as a function call. Many modern things that call themselves "RPC" instead focus on having a function-like flow in the happy case, but don't promise to magically make problems with local vs. remote references go away, and instead of jumping through hoops to try to solve the problems just have you deal with them.

I caught the tail end of the RPC world at the beginning of my programmer career. It was a horrible place. You'd have these horrifyingly complex manifests of what the remote RPC call could do, and then they might get compiled into your code like "functions", and then problems as simple as "the remote function call was lost and you never got a reply" would just be a function you called that hung your program forever. Dealing with this was awful and hacky and ugly, because the function abstration they were trying to jam themselves so hard into simply didn't have a place for handling "the function you're trying to call is missing", so you might have to declare "handlers" elsewhere that ran in entirely different contexts or who knows what garbage. Total mess. In the process of trying to make things easier than they could possibly be, they made handling errors incredibly difficult. The sloppiest modern "I vaguely take some JSON and return some other JSON" HTTP API is much preferable to this mess.

One of the interesting lessons of software engineering is that just because something is impossible in general doesn't mean you can't still try, and get something that sorta sometimes works some of the time if all the stars align, and then get people selling that as the next hot thing that is the most important thing in programming ever.

Despite what you may cynically think, I actually don't have any current tech in mind as I type that. It may just be a lack of perspective as I'm as embedded in the present as anyone else, but I don't feel like there's a lot of extant techs right now in the general programming space promising the impossible. Closest I can think of is the machine learning space, but my perception is that it isn't so much that machine learning is promising the impossible as that there are a lot of people who expect it to do the impossible, which isn't quite what I mean. I mean more like people selling database technology that, in order to exist, must completely break the CAP theorem (and I don't just mean shading around the edges and playing math games, but breaking it), or RPCs that fundamentally only work if all 8 fallacies of distributed computing [1] were in fact true, or techs that don't work unless time can be absolutely and completely synchronized between all nodes, and so on. I'm sure there's little bits of those here and there, but there was a time when this sort of impossible RPC was thought of as the future of programming. The software engineering space has become too diverse for things that are essentially fads to take over the whole of it like things could in the 1980s and 1990s.

(See also OO, which also has a similar story where if you learn OO in 2022, you're learning a genteel and tamed OO that has been forced to tone down its promises and adapt to the real world. The OO dogma of the 1980s and 1990s was bad and essentially impossible, and only a faint trace of it remains. Unfortunately, that "faint trace" is still in a few college programs, so it has an outsized effect, but for the most part the real world seems to break new grads of it fairly quickly nowadays.)

Finally, this should be contrasted with the modern answer that is continuing to grow, which is message passing systems. A message passage system loosens the restrictions and offers fewer guarantees, and as such, can do more. You can always layer an RPC system for your convenience on top of a message passing system, but you can't use an RPC system to implement a message passing system with the correct semantics, because the RPC system implements too much. I personally view "RPC", in the looser modern form, as a convenient particular design pattern for message passing, but not as the fundamental abstraction lens you should view your system through. Even the modern genteel form of RPC imposes too many things, because sometimes you need a stream, sometimes you need an RPC, sometimes you just need a best-effort flinging of data, etc. When you have a place you need RPC's guarantees, by all means use an established library for it if you can, but when you need something it can't do, drop it immediately and use the lower-level message bus you should have access to.

[1]: https://www.simpleorientedarchitecture.com/8-fallacies-of-di...


Essentially it all boils down to theory vs practice, where the beautiful theoretical concepts fell prey to the various fallacies of distributed computing:

https://en.wikipedia.org/wiki/Fallacies_of_distributed_compu...

Imo only Erlang really gets this right, and QnX did a halfway decent job of it at the 'OS on bare metal' level. Everything else is just endless duct tape and baling wire.


Erlang is so far the only environment I know that took the plunge of expanding the domain of "RPC call" into things that are, technically, local. Thus harmonizing local and remote calls to a much greater degree. Although it's still hard to hide the potential latency differences no matter what you do.

This is one of the reasons that even though I'm kinda down on actually using Erlang for a real system, I really recommend any aspiring language designer spend some time with it. Like Haskell, many languages have copied some of its surface aspects, but few languages are capturing its unique deep value proposition. I live without it because on the balance I prefer other things, but what would be even better is having both my later conveniences and capabilities and Erlang's wisdom.

I've noodled around with trying to construct a language where the lower guarantees of an RPC call are the default, and you have to "declare" that you want to "cast" an RPC-like API into a static local API, which is then compiler-time checked for being valid to do that with. I haven't got very far beyond the "drool on a napkin" phase for it though.


Indeed, and it is why I think the Erlang solution is the way to go: you can't get there by expanding the concept of 'local' to the special case of 'remote', but you can take the 'general case of remote' and then take some shortcuts to give you a speed advantage if the call turns out to be local. That saves you having to keep two ways of doing things in your head and makes sure you get it right every time instead of just the times that you were thinking about it hard beforehand. As for latency: yes, you can't fake that, but interestingly, if you pick the boundaries of your services carefully then you can go a surprisingly long way before that becomes a problem.

The concept you are describing sounds interesting, it would be a nice middle ground. QnX has some of this in their send/receive/reply combo, there the default destination host is the localhost, and because that is abstracted away you will always specify the counterparty so your code will already be network aware. QnX admins are what we'd call services and they can be as simple or as complex as you want, including a whole new cluster behind a simple endpoint. I've built some pretty complex installations with that (100's of servers, 1000's of processes) well before 'infrastructure as code' was a thing and the reliability and response you'd get from that was unlike anything that is on the market today.

But Erlang, assuming your problem doesn't require a lot of computation (which very unelegantly escapes to C) and fits nicely in with the OTP concept (which is almost all code that is communications and CRUD like constructs) it is as slick as it gets all the way to having the supervision, monitoring and start-up seamlessly built into the environment.

I wished I had a good excuse to build something for real with it.


Do you and jerf have any opinions or experience with tuplespace as an abstraction over the network?


> See also OO, which also has a similar story where if you learn OO in 2022, you're learning a genteel and tamed OO that has been forced to tone down its promises and adapt to the real world. The OO dogma of the 1980s and 1990s was bad and essentially impossible, and only a faint trace of it remains. Unfortunately, that "faint trace" is still in a few college programs, so it has an outsized effect, but for the most part the real world seems to break new grads of it fairly quickly nowadays.

Are you talking about the notion that OO is about making an ontology, and that recapitulating a poorly-thought-out subset of the Dewey Decimal System in your codebase will somehow make it easier to reason about and extend? The core of OO, that some types have shared behavior and Wouldn't It Be Nice If it were possible to have only one implementation of that behavior which could be shared among all types which share it, is, at best, orthogonal to that ontological notion, and sometimes actively anathema to it. (For example, what ontology includes, as a top-level concept, Things Which Can Be Compared To Each Other, or Things Which Can Be Collected Into A List?)

I also have another example: GOTO as a programming construct now is essentially not comparable to GOTO as it was in FORTRAN 66 or contemporary languages. Skipping ahead a few lines within a function is mostly anodyne, if not outright laudable if it gets you out of a loop without messy flag varibles; skipping three pages back to the middle of a different function is the kind of thing Wirth was ranting about. Do not even mention computed or assigned GOTOs.


"Are you talking about the notion that OO is about making an ontology, and that recapitulating a poorly-thought-out subset of the Dewey Decimal System in your codebase will somehow make it easier to reason about and extend?"

I'm referring to the origin of OO in Simula, a language for writing simulations. From that early OO had an extremely strong bias in favor of modelling "domain objects". In that model of OO, having an "iterator" class is actively wrong, because there's no physical corresponding to an object. That's where the Car/Engine/Tires example comes from, and to a lesser extent the Animal/Mammal/Cow etc. examples.

It's been 25 years or so, but I think when I took my software engineering class I was actively told my OO should model real things.

The irony is that I think OO shines most brightly in the abstract machinery of programming like iterators and other machinery like that. It has problems when you try to connect it to the real world by "exporting" the purely internal state changes out to something real. This is a general problem in software and it's not like I have perfect solution, but the OO tools tend to not be anywhere near as helpful here. (By that I mean, yes, we write the code that does this in an OO context, but inheritance/private/public/etc. isn't really that helpful in terms of keeping a particular variable in your object in sync with the database value it's supposed to reflect, for instance.) Simula somewhat deceived the early practitioners, because in Simula, state changes within the object are the desired outcome; when your Engine object flips its On flag from false to true, the object's job is done. This does not necessarily translate into the real world anywhere near as well.

And yes, goto is definitely another one of those things were you can't understand the old criticisms through a modern language lens, because the goto we have today has been tamed and brought to heel under structured program's constraints. However, unlike alephnan's objection that he doesn't see the point of the RPC criticisms which I think is the common case, the goto criticisms continue to live on as zombies long after the thing being criticized is actually dead and buried, injuring the more modern version.


I worked with a system recently that did remote procedure calls. You'd invoke it locally, and it would arrange for the source code of the procedure, the arguments, and the output variables to be pickled and sent to the server. does that truly differ from an x86 executable pushing arguments on a stack and jump/returning to a subroutine?


Yes it does. The failure modes are different and so the switch to an RPC from a local call is not transparent. Which is what Tanenbaum discusses in the original paper and this review mostly talks past or restates hut misinterprets.

https://www.win.tue.nl/~johanl/educ/2II45/2010/Lit/Tanenbaum...


Many of the things he says in that article bear no resemblance to how problems are solved in the modern era.

Nobody other than andrew seems to believe strongly that "transparency" is a required component of procedure calls.


RPC is literally calling a remote function over the network without the infrastructure of HTTP - you implement it in your code not too different from a normal function call (you do have to use an Interface Description Language to mark some parameters as "in" and "out").

It's a substandard way to implement distributed computing, because:

- it assumes you and the remote end are in lockstep which is hard to guarantee at scale;

- if you and your remote end are not in lockstep you get weird hangs in your program because you are pretending code is on your machine when it's not;

- an infrastructure must exist to make sure you are calling the remote code you intend, including things like component registration, etc. and for the amount of work required to make that work well, you might as well write a TCP/IP server. With TLS and other authentication mechanisms, current CPU speeds and JSON and HTTP becoming de facto messaging formats you might as well use all of that for remote API access instead of literally trying to send what's pretty much raw function calls over a wire.


It seems that your first two points are an argument against synchronous RPC. These days asynchronous functions are frequently used in-process, so RPC calls that use the same async abstractions seem appropriate.

I guess the historical background is that the RPC syntactic sugar implied sync calls.

Regarding your last point, there is no reason why a language level RPC implementation cannot map to TLS/HTTP/JSON as the transport layer and encoding.

The real issue with RPC is that only a small subset of all the types supported in a language can be sanely supported, and it just ends up being syntactic sugar for message passing plus matching of replies to requests. There is nothing wrong with this, but it is indeed a leaky abstraction.


Yes - I see other posters trying to spin the article to make it only an argument against "old" RPC solutions (CORBA I guess?). This seems to be an attempt to leave an escape hatch for their pet RPC solutions that happen to be more modern.

When I first joined my workplace, some of the resident engineers were overly disposed to cargo culting google practices. I had a robotics application that just screamed for using something like ROS, but I was encouraged to use gRPC to underpin these microservices. I was inexperienced enough to assume they knew something I didn't, but I kept scratching my head trying to figure out why this was supposed to be better than a simple pub-sub system. It felt like a huge impedance mismatch with what was actually happening at the message level. When I asked myself whether I could use gRPC in a simpler, pub-sub-like mode, I instead found a bunch of janky projects where people were trying to build pub-sub on TOP of gRPC (?!) It wasn't until later when some of the cargo-culters left and new folks with a robotics background joined the company that I realized I had been right all along.

Reading this article was cathartic because it expressed a truth I didn't know I was allowed to think - I don't _want_ a procedure-like interface! It is simultaneously simpler and more correct to think in terms of message-oriented IPC. RPC seems to be about letting people think in a lazy away about how their distributed application works... but you inevitably end up saddling them with a long tail of things that were brushed under the rug.


Not CORBA - RPC preceded it by at least a decade (at least on Unix and possibly VMS). You defined an interface using a description language (XDR) which generally produced both client and server code. edit: there was also a utility "rpcgen" that automated a lot of stuff.

Sun RPC was (at least I think) the most popular version of this. https://en.wikipedia.org/wiki/Sun_RPC

The generated code looked after all the marshalling/unmarshalling of your types, did the connection handshaking etc. You had to have a service running on your Unix machine (rpc.bind?). NFS used to use XDR and some variant of RPC to work but I am not sure it does any more.


> NFS used to use XDR and some variant of RPC to work but I am not sure it does any more.

Yep, still does [1]. Even 4.1 and 4.2 (the most modern variants of NFS) all start with .x files and XDR encoding. The ancient names codified by Sun still exist and still work (and the tooling still works for introspection as a nice bonus).

[1] https://github.com/nfs-ganesha/nfs-ganesha/tree/next/src/Pro...


I guess the whole Sun set of tools was built on RPC: All the yellow pages stuff, automounting your home directory etc. Never really thought about it too hard when I was a Sun administrator back in the 1990s.

If somebody gave me a choice of CORBA or Sun RPC, I'd still pick Sun RPC. CORBA was a nightmare.


The problem with that last bit is it’s just a prescriptivist rant. You can complain as loud as you like that people are using language wrong, if enough people use it like that the dictionary gets updated.


   sort <infile | uniq | wc –l > outfile
> [...] This construct uses a push down stack, so that the output of what is inside the angle-brackets is the input to sort

It doesn't really invalidate the argument, but unless I'm misunderstanding their statement, I think the author might have misparsed the above.


FWIW, in Tanenbaum's original paper (which the author is reading), the correct meaning of the example is given quite plainly:

"consider a simple UNIX pipeline: sort <infile | uniq | wc -l >outfile that sorts infile, an ASCII file with one word per line, strips out the duplicates, and prints the word count on outfile."

So yes, it does seem the author did missed at least that part. Not sure about the rest of their arguments, but at this point, I'd say it's easy enough to find the original paper, which does seem quite readable.

https://www.cs.vu.nl/~ast/Publications/Papers/euteco-1988.pd...


I am thankful to you for highlighting this part of the long paper. That completely wrong assertion (the input to sort is infile, nothing more, nothing less) casts the rest of the content in doubt (at least it does to me).


Wait, you're saying that the author of several textbooks on OSs and author of MINIX is, somehow, not trustworthy due to one perhaps inaccurate interpretation?

Harsh. ;-)


Tanenbaum did not make the inaccurate interpretation of the:

  sort < infile | uniq | wc -l > outfile
line. John Day, the author of the linked article, was the one who misunderstood it.


Well, and Tanenbaum's original paper is very clear on what the statement does and how to interpret it: https://www.cs.vu.nl/~ast/Publications/Papers/euteco-1988.pd...


That's a pretty basic misunderstanding and shows the author may not be qualified to even discuss this concept or, more charitably, that they are not from a UNIX background. I wonder if they would have read the same example without redirection of the output as a case of mismatched brackets ;)


From my point of view, Tannenbaum's critique is basically that RPC is an exceedingly leaky abstraction.

I don't think very much has changed in the state of the art to address that point: you still need to understand that your arguments will need to be marshaled (which restricts what types make sense in languages that support pointers, references etc), you still need to deal with exceptions that result from the RPC mechanism as opposed to business logic, you still need to worry about whether those exceptions mean the procedure completed, or not, or when it might've failed, which forces you to structure the code for idempotency and so on.


> What would a true remote procedure call in a distributed programming language look like?

Have a look at QnX and at the Erlang/OTP framework and virtual machine (called BEAM).


Tanenbaum's original critique, which is quite readable, for reference.

https://www.cs.vu.nl/~ast/Publications/Papers/euteco-1988.pd...


I find it interesting that one of Tanenbaum’s criticisms: that they needed to be part of the language, has indeed been addressed. Pretty much every language now has Promise-like semantics that do, indeed, allow you to use a network call in a similar manner to a regular procedure call. But we had to change the semantics of the average programming language to do it.

His take that it needed to be part of the OS seems to have proved incorrect. TCP and UDP are definitely part of the OS, but HTTP sits in the user layer without people blinking an eye.


It's worth mentioning that the primary hazard, that networks fail and this makes a mess of function types, does not necessarily apply on shared memory systems.

Specifically when 'remote' means a CPU in the same address space, communicating over a reliable medium (if pcie fails, so does the nldey, which ~= perfectly reliable), then pointers can be passed around without problem and the function call can look and behave exactly as if it was running on the local CPU.


>It is our contention that a large number of things may now go wrong due to the fact that RPC tries to make remote procedure calls look exactly like local ones, but is unable to do it perfectly.

This is precisely where I feel like there has been substantial progress. RPC frameworks like Cap'n Proto do an excellent job of providing network-aware APIs that give you precise control over network IO.


I think if the paper can start with the taxtonomy and position the Rpc in it as in the box it would be much better. It is more like he is talking to himself. Too much assumption on the reader. Know rpc, Ipc, Fortran, cobol, … sorry bad writing and need too much effort may I say.

What he want to say vs what he said


> sorry bad writing and need too much effort may I say.

I’m sorry but his writing was much more intelligible than yours.

This writing is an academic critique by an academic. Some context is expected and they’re writing for their audience ( academics and people who have used RPC or have an opinion on them ).


Pretty sure you missed the joke.

And I agree with OP that the linked article is pretty impenetrable - he does seem to have something to say, but he doesn't do a good job of saying it. He starts off with what seems to be a very pedantic criticism of the term "RPC" which may be perfectly valid, but he doesn't back it up at all other than to say that procedures exist in programming languages.


Without defending the other commenter, I'll say I agree about the article being poorly written. I'm a slow & careful reader, and I found the poor grammar and confusing constructions difficult to follow.


Problem is that RPC doesn't take into account how data is managed, and that you're basically dealing with microservices.


RPC goes far beyond 'microservices' as a concept unless you use for instance AWS 'Lambda' functions as a reference point. Whoever answers the RPC could be anything, and it could be a very large monolith rather than a microservice.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: