Great that someone have written this down. These points should be stressed as much as possible.
Anyway, after you move from RPC to message passing and then to state machines with delayed action execution you are facing the next big problem: the fact that message queues between the components are not perfectly elastic and can run out of memory if different components run at widely different speeds.
This is mostly not a problem inside datacenter, where admins take care not to run out of resources. In the Internet environment though it is often a common case rather than a rare occurence.
I haven't seen this problem addressed so far, whether in software or in theory. Any pointers to relevant work are welcome.
I have never made use of reification of actions in real software before. Can anybody weigh in on the costs, methods, tools, or any other aspect of extensive action reification? I'm gonna have to experiment with this.
You're absolutely right, I meant to say "extensive" action reification, in the sense that the author claims you must reify "all your actions"—although, indeed, I've never used the GOF command pattern.
In any case, perhaps I the question I meant to ask was, "Does anybody care to share some thoughts on designing systems with complete action reification?"
"The Systems Bible", also known as Systemantics, is a classic and highly recommended. It's probably not what you expect: it covers systems in general, and large complex systems in particular. It's not specific to computer systems let alone any particular flavor of computer system - these are rules which apply to all systems, from a municipal garbage collection program to a supertanker. Yet if you've ever dealt with a real-world distributed system you'll find it surprisingly relevant and timeless.
I suspect Level 4 is to read Distributed Algorithms by Nancy Lynch, who won the Knuth Prize for this stuff. A shorter book that looks good is Distributed Systems: An Algorithmic Approach, by Sukumar Ghosh. I recently bought them both but haven't tackled them yet.
"Enterprise Integration Patterns: Designing, Building, and Deploying Messaging Solutions" is a classic for messaging. It's pretty advanced I guess, but beginners can understand it too.
<ignorant_comment>
why can't most cases be solved by coding all components of the distributed system as servers exposing APIs by various simple protocols (SOAP or simpler json based web APIs)... then have all servers/components developed by different teams and maybe using different languages and platforms and have each of them solve the problems they encounter... forget about determinism, forget about reproducible state, go 100% fluid and asynchronous
</ignorant_comment>
...oh, wait, did I just describe the hive-mind-ish (but gorgeous:P) monster that the web is evolving into? :)
(edit to add long winded off topic reasoning: why should we even try to understand as a whole the complex systems we create? can't we just build them out of understandable components and accept the fact that any large distributed system will tend to evolve into something beyond the comprehension of any human mind and accept that "100% understanding" !== "control" and that we don't even need that much control, we just need to make things work 95% of the time. just divide the problem, divide the people and let the "hive mind" evolve... functional languages and the conceptual tools they offer us, useful as they surely are, just foster the illusion that complete understanding and control of the systems we create will continue to be possible - I'd argue that some of what we now cal "distributed systems" are already tipping past the edge of "undestandabilty" and we should slowly learn ways of controlling their evolution and components and not the systems as a whole...)
Because sometimes you need the systems to cooperate, not just serve data. The problems the OP describe all have to do with sharing state. The protocols you describe are stateless.
I know it was a bit off topic, that's why I labeled it as 'ignorant_comment', but... in how many types of problems do we really need an actual "shared state"? can't this shared state be something like the "sum of states of the system components that expose stateless APIs", like an "emergent property type of state"? Even for real problems like how many social networks centered around a person with property X have a network property Z we only need approximate answers...
I was just expressing an intuition that as we go past "Level X" (put a large number in there), keeping the OP's metaphor, we may graduate out of this need for a clearly defined shared state for most of the problems we are trying to solve... we may think in terms of a "perceivable state" that gives as a probability for the system to be in a certain "'real' state"...
(think of the human consciousness or self, we imagine that it really exists, that there actually is an "I" or "self", we consider "my mind in this second" as a "state"... but it can just as well be seen as an emergent property, a perceived state that has a certain probability to exist in a certain way based on the states of zillions of components more or less well connected)
yes. but imagine how dog breeders influence where a certain dog race's evolution is going... I imagine a day when some of the computer systems will be engineered this way, with a good mathematical framework for evolution and all... and I think we're closer to this than it may seem, just look how programming languages "evolved" and were "selected" in the last half a century... everything was very empirical, otherwise we'd all be coding in a Lisp for experimental programming and an ML family language for the rest maybe...
...anyway, it's going very off topic so better leave it and let's get back to ..."work" :|
Hm, suggest that level four is using virtual synchrony via a group multicast protocol (typically, Totem) that provides strong guarantees with regard to Lamport ordering.
Anyway, after you move from RPC to message passing and then to state machines with delayed action execution you are facing the next big problem: the fact that message queues between the components are not perfectly elastic and can run out of memory if different components run at widely different speeds.
This is mostly not a problem inside datacenter, where admins take care not to run out of resources. In the Internet environment though it is often a common case rather than a rare occurence.
I haven't seen this problem addressed so far, whether in software or in theory. Any pointers to relevant work are welcome.