The Byzantine Generals Problem (1982) [pdf]

mjb · on Dec 4, 2014

If you're interested, here's some other interesting reading in this area:

* Castro and Liskov, "Practical Byzantine Fault Tolerance", http://www.pmg.lcs.mit.edu/papers/osdi99.pdf As the title says, this paper describes a practical consensus algorithm that tolerates Byzantine failures. In some ways it is provably optimal.

* Lamport, "Leaderless Byzantine Paxos", http://research.microsoft.com/en-us/um/people/lamport/pubs/d... Interesting follow-on from Castro and Liskov, removing the role of the leader.

* Driscoll, "Murphy was an Optimist", http://www.rvs.uni-bielefeld.de/publications/DriscollMurphyv... These things really happen in practice.

* van Renesse et al, "Byzantine Chain Replication", http://www.cs.cornell.edu/home/rvr/newpapers/opodis2012.pdf Very fast replication in a model that allows Byzantine failures.

th3iedkid · on Dec 4, 2014

One more i would like to add to this list is:

Tolerating Byzantine Faults in Database Systems using Commit Barrier Scheduling [http://people.csail.mit.edu/benmv/hrdb-sosp07.pdf]

kanzure · on Dec 4, 2014

Here is the explanation that Satoshi Nakamoto was using (and you should totally read most things written by Lamport):

http://web.archive.org/web/20090309175840/http://www.bitcoin...

A number of Byzantine Generals each have a computer and want to attack the King's wi-fi by brute forcing the password, which they've learned is a certain number of characters in length. Once they stimulate the network to generate a packet, they must crack the password within a limited time to break in and erase the logs, lest they be discovered. They only have enough CPU power to crack it fast enough if a majority of them attack at the same time.

They don't particularly care when the attack will be, just that they agree. It has been decided that anyone who feels like it will announce an attack time, which we'll call the "plan", and whatever plan is heard first will be the official plan. The problem is that the network is not instantaneous, and if two generals announce different plans at close to the same time, some may hear one first and others hear the other first.

They use a proof-of-work chain to solve the problem. Once each general receives whatever plan he hears first, he sets his computer to solve a difficult hash-based proof-of-work problem that includes the plan in its hash. The proof-of-work is difficult enough that with all of them working at once, it's expected to take 10 minutes before one of them finds a solution and broadcasts it to the network. Once received, everyone adjusts the hash in their proof-of-work computation to include the first solution, so that when they find the next proof-of-work, it chains after the first one. If anyone was working on a different plan, they switch to this one, because its proof-of-work chain is now longer.

After about two hours, the plan should be hashed by a chain of 12 proofs-of-work. Every general, just by verifying the difficulty of the proof-of-work chain, can estimate how much parallel CPU power per hour was expended on it and see that it must have required the majority of the computers to produce in the allotted time. At the least, most of them had to have seen the plan, since the proof-of-work is proof that they worked on it. If the CPU power exhibited by the proof-of-work is sufficient to crack the password, they can safely attack at the agreed time.

ekajjake · on Dec 4, 2014

I don't quite understand - what happens if two of them find a hash solution at the same time and both broadcast it? Then you have the same problem as before, right?

AgentME · on Dec 4, 2014

All users prefer the longest chain. In the case of two solutions being broadcast at about the same time, then each miner will probably prefer the solution they received first, and will base their solution attempts off of that chain. It is very unlikely that both chains will each have a new block broadcast at the same time. Whichever chain gets a new block first will be the winner and everyone will switch to that chain, abandoning the now-orphan chain.

sjeohp · on Dec 4, 2014

The two chains will fall out of sync before they're finished. Chance of all 12 proofs being completed at exactly the same time and broadcast to groups of identical computing power every step of the way is very small.

elpachuco · on Dec 4, 2014

Murphy's law: "If it can happen, it will happen"

skj · on Dec 4, 2014

Some probabilities approach zero faster than the universe can approach heat death.

blake8086 · on Dec 4, 2014

What if it involves hash collisions?

tomp · on Dec 4, 2014

No SHA2 hash collisions have ever been found.

maaku · on Dec 4, 2014

What's the chance of that happening 12 times in a row?

(Rhetorical question, you'll find the answer in the last section of the bitcoin whitepaper.)

jsprogrammer · on Dec 4, 2014

Note that you only need to do all of that work (assuming you want to implement this algorithm), if the generals disagree on what plan was heard first.

AceJohnny2 · on Dec 4, 2014

This is a classic paper from 1982, which contributed in setting Leslie Lamport as a top distributed computing researcher.

What did the poster want to indicate?

jnks · on Dec 4, 2014

Perhaps because there's a lot of breathless press [1] about Bitcoin solving this "impossible" problem that was in actually solved decades ago?

[1] http://nonchalantrepreneur.com/post/70130104170/bitcoin-and-...

zik · on Dec 4, 2014

It's simplistic to say that the problem was "solved decades ago". Every existing attempt at Byzantine fault tolerance is limited in one way or another - if you're looking for a solution it's really a matter of choosing a solution based on which set of painful limitations you want to live with.

Bitcoin's solution is a pretty good one for its use case. Most other solutions would have trouble scaling like the blockchain can, but in many applications its very long consensus lead time would be unacceptable.

typedweb · on Dec 4, 2014

Correct.

typedweb · on Dec 4, 2014

Sometimes I just use HN as a bookmarker for interesting things. I didn't expect it to make it to the front page :)

tormeh · on Dec 4, 2014

http://research.microsoft.com/en-us/people/mickens/thesaddes...

One of the funniest things I've read about tech.

codemac · on Dec 4, 2014

    JAMES: I announce my desire to go to lunch.

    BRYAN: I verify that I heard that you want to go to lunch.

    RICH: I also verify that I heard that you want to go to lunch.

    CHRIS: YOU DO NOT WANT TO GO TO LUNCH.

    JAMES: OH NO. LET ME TELL YOU AGAIN THAT I WANT TO GO TO LUNCH.

    CHRIS: YOU DO NOT WANT TO GO TO LUNCH.

    BRYAN: CHRIS IS FAULTY.

    CHRIS: CHRIS IS NOT FAULTY.

    RICH: I VERIFY THAT BRYAN SAYS THAT CHRIS IS FAULTY.

    BRYAN: I VERIFY MY VERIFICATION OF MY CLAIM THAT RICH CLAIMS THAT I KNOW CHRIS.

    JAMES: I AM SO HUNGRY.

    CHRIS: YOU ARE NOT HUNGRY.

    RICH: I DECLARE CHRIS TO BE FAULTY.

    CHRIS: I DECLARE RICH TO BE FAULTY.

    JAMES: I DECLARE JAMES TO BE SLIPPING INTO A DIABETIC COMA.

    RICH: I have already left for the cafeteria.

lectrick · on Dec 4, 2014

This is amazing. And also begs the question, how is "authority" established so much easier "in real life" vs. digitally? Side channel information?

ddispaltro · on Dec 4, 2014

Take the succession order of the commander and chief, there a strict successive order when shit happens. It's master election vs the generals problem. Personally, I find master election much easier to cognitively reason about.

mkramlich · on Dec 4, 2014

use of force (cops, military)

Chris: You do not want to go to lunch.

Rich points gun at Chris.

Chris: Let me rephrase that. I misspoke. I'm sorry.

riffraff · on Dec 4, 2014

OT, but for the uninitiated who you enjoyed the above and didn't know him, the James Mickens USENIX column is absolutely hilarious and insightful.

Other examples can be found towards the bottom of his MSR page[0] (search for "humor").

[0] http://research.microsoft.com/en-us/people/mickens/

dghf · on Dec 4, 2014

http://research.microsoft.com/en-us/people/mickens/thenightw... in particular is a thing of beauty.

    Indeed, the common discovery mode for an impossibly 
    large buffer error is that your program seems to be 
    working fine, and then it tries to display a string 
    that should say “Hello world,” but instead it prints 
    “#a[5]:3!” or another syntactically correct Perl 
    script, and you’re like WHAT THE HOW THE, and then 
    you realize that your prodigal memory accesses have 
    been stomping around the heap like the Incredible Hulk 
    when asked to write an essay entitled “Smashing 
    Considered Harmful.”

tormeh · on Dec 4, 2014

"Smashing considered harmful" - The Hulk

About C++:

"When it’s 3 A.M., and you’ve been debugging for 12 hours, and you encounter a virtual static friend protected volatile templated function pointer, you want to go into hibernation and awake as a werewolf and then find the people who wrote the C++ standard and bring ruin to the things that they love."

dghf · on Dec 4, 2014

Also about C++:

"One time I tried to create a list<map<int>>, and my syntax errors caused the dead to walk among the living."

limelight · on Dec 4, 2014

I'm definitely recommending that as requiring reading in all algorithms or distributed systems classes.