The Game of Distributed Systems Programming

rumcajz · on June 27, 2012

Great that someone have written this down. These points should be stressed as much as possible.

Anyway, after you move from RPC to message passing and then to state machines with delayed action execution you are facing the next big problem: the fact that message queues between the components are not perfectly elastic and can run out of memory if different components run at widely different speeds.

This is mostly not a problem inside datacenter, where admins take care not to run out of resources. In the Internet environment though it is often a common case rather than a rare occurence.

I haven't seen this problem addressed so far, whether in software or in theory. Any pointers to relevant work are welcome.

vicaya · on June 27, 2012

Level 4: You realize all abstractions are leaky. Yet, you feel comfortable using appropriate abstractions based on the use cases.

digeridoo · on June 27, 2012

I'm not sure about the OP, but I'm pretty sure you have built distributed systems in practice. :)

vicaya · on June 27, 2012

Hehe, I have only built small systems up to a few thousand nodes though, a long way to go to million autonomous node systems :)

Then there is level 11: You're Jeff Dean.

mvzink · on June 27, 2012

I have never made use of reification of actions in real software before. Can anybody weigh in on the costs, methods, tools, or any other aspect of extensive action reification? I'm gonna have to experiment with this.

toolslive · on June 27, 2012

I'm pretty sure you're lying ;) . If you've ever used the GOF command pattern or an object representing an action, you've used reification.

mvzink · on June 27, 2012

You're absolutely right, I meant to say "extensive" action reification, in the sense that the author claims you must reify "all your actions"—although, indeed, I've never used the GOF command pattern.

In any case, perhaps I the question I meant to ask was, "Does anybody care to share some thoughts on designing systems with complete action reification?"

mvzink · on July 2, 2012

Update (in case anybody sees this) an example of the sort of information I was looking for could be found here: http://collectiveidea.com/blog/archives/2012/06/28/wheres-yo...

rrc · on June 27, 2012

Can somebody recommend some good books on building distributed systems, both introductory and more advanced?

rdtsc · on June 27, 2012

http://learnyousomeerlang.com

It is also free.

After learning the syntax, check out these chapters

* "Designing a Concurrent Application" http://learnyousomeerlang.com/designing-a-concurrent-applica...

* "Buckets of Sockets" http://learnyousomeerlang.com/buckets-of-sockets

* "Distribunomicon" http://learnyousomeerlang.com/distribunomicon

* "Distributed OTP Applications" http://learnyousomeerlang.com/distributed-otp-applications

shykes · on June 27, 2012

"The Systems Bible", also known as Systemantics, is a classic and highly recommended. It's probably not what you expect: it covers systems in general, and large complex systems in particular. It's not specific to computer systems let alone any particular flavor of computer system - these are rules which apply to all systems, from a municipal garbage collection program to a supertanker. Yet if you've ever dealt with a real-world distributed system you'll find it surprisingly relevant and timeless.

DennisP · on June 27, 2012

I suspect Level 4 is to read Distributed Algorithms by Nancy Lynch, who won the Knuth Prize for this stuff. A shorter book that looks good is Distributed Systems: An Algorithmic Approach, by Sukumar Ghosh. I recently bought them both but haven't tackled them yet.

SkyMarshal · on June 27, 2012

Distributed Systems: Principals and Paradigms, by Andrew Tannenbaum (Minix creator, Linus Torvald's OS newsgroup nemesis).

1. http://www.amazon.com/Distributed-Systems-Principles-Paradig...

Avalaxy · on June 27, 2012

"Enterprise Integration Patterns: Designing, Building, and Deploying Messaging Solutions" is a classic for messaging. It's pretty advanced I guess, but beginners can understand it too.

nnq · on June 27, 2012

<ignorant_comment> why can't most cases be solved by coding all components of the distributed system as servers exposing APIs by various simple protocols (SOAP or simpler json based web APIs)... then have all servers/components developed by different teams and maybe using different languages and platforms and have each of them solve the problems they encounter... forget about determinism, forget about reproducible state, go 100% fluid and asynchronous </ignorant_comment>

...oh, wait, did I just describe the hive-mind-ish (but gorgeous:P) monster that the web is evolving into? :)

(edit to add long winded off topic reasoning: why should we even try to understand as a whole the complex systems we create? can't we just build them out of understandable components and accept the fact that any large distributed system will tend to evolve into something beyond the comprehension of any human mind and accept that "100% understanding" !== "control" and that we don't even need that much control, we just need to make things work 95% of the time. just divide the problem, divide the people and let the "hive mind" evolve... functional languages and the conceptual tools they offer us, useful as they surely are, just foster the illusion that complete understanding and control of the systems we create will continue to be possible - I'd argue that some of what we now cal "distributed systems" are already tipping past the edge of "undestandabilty" and we should slowly learn ways of controlling their evolution and components and not the systems as a whole...)

tinco · on June 27, 2012

Because sometimes you need the systems to cooperate, not just serve data. The problems the OP describe all have to do with sharing state. The protocols you describe are stateless.

nnq · on June 27, 2012

I know it was a bit off topic, that's why I labeled it as 'ignorant_comment', but... in how many types of problems do we really need an actual "shared state"? can't this shared state be something like the "sum of states of the system components that expose stateless APIs", like an "emergent property type of state"? Even for real problems like how many social networks centered around a person with property X have a network property Z we only need approximate answers...

I was just expressing an intuition that as we go past "Level X" (put a large number in there), keeping the OP's metaphor, we may graduate out of this need for a clearly defined shared state for most of the problems we are trying to solve... we may think in terms of a "perceivable state" that gives as a probability for the system to be in a certain "'real' state"...

(think of the human consciousness or self, we imagine that it really exists, that there actually is an "I" or "self", we consider "my mind in this second" as a "state"... but it can just as well be seen as an emergent property, a perceived state that has a certain probability to exist in a certain way based on the states of zillions of components more or less well connected)

toolslive · on June 27, 2012

not all evolution converges, and even if it would converge, you might want to influence where it's going, no?

nnq · on June 27, 2012

yes. but imagine how dog breeders influence where a certain dog race's evolution is going... I imagine a day when some of the computer systems will be engineered this way, with a good mathematical framework for evolution and all... and I think we're closer to this than it may seem, just look how programming languages "evolved" and were "selected" in the last half a century... everything was very empirical, otherwise we'd all be coding in a Lisp for experimental programming and an ML family language for the rest maybe...

...anyway, it's going very off topic so better leave it and let's get back to ..."work" :|

inopinatus · on June 27, 2012

Hm, suggest that level four is using virtual synchrony via a group multicast protocol (typically, Totem) that provides strong guarantees with regard to Lamport ordering.

factorialboy · on June 27, 2012

Repost? I'm sure I've read this before..

nicolast · on June 27, 2012

http://news.ycombinator.com/item?id=3765399

factorialboy · on June 27, 2012

Can't believe I get down voted for mentioning the obvious. :-/

dodothelast · on June 27, 2012

Level 5: XMPP