If you're interested, here's some other interesting reading in this area:
* Castro and Liskov, "Practical Byzantine Fault Tolerance", http://www.pmg.lcs.mit.edu/papers/osdi99.pdf As the title says, this paper describes a practical consensus algorithm that tolerates Byzantine failures. In some ways it is provably optimal.
A number of Byzantine Generals each have a computer and want to attack the King's wi-fi by brute forcing the password, which they've learned is a certain number of characters in length. Once they stimulate the network to generate a packet, they must crack the password within a limited time to break in and erase the logs, lest they be discovered. They only have enough CPU power to crack it fast enough if a majority of them attack at the same time.
They don't particularly care when the attack will be, just that they agree. It has been decided that anyone who feels like it will announce an attack time, which we'll call the "plan", and whatever plan is heard first will be the official plan. The problem is that the network is not instantaneous, and if two generals announce different plans at close to the same time, some may hear one first and others hear the other first.
They use a proof-of-work chain to solve the problem. Once each general receives whatever plan he hears first, he sets his computer to solve a difficult hash-based proof-of-work problem that includes the plan in its hash. The proof-of-work is difficult enough that with all of them working at once, it's expected to take 10 minutes before one of them finds a solution and broadcasts it to the network. Once received, everyone adjusts the hash in their proof-of-work computation to include the first solution, so that when they find the next proof-of-work, it chains after the first one. If anyone was working on a different plan, they switch to this one, because its proof-of-work chain is now longer.
After about two hours, the plan should be hashed by a chain of 12 proofs-of-work. Every general, just by verifying the difficulty of the proof-of-work chain, can estimate how much parallel CPU power per hour was expended on it and see that it must have required the majority of the computers to produce in the allotted time. At the least, most of them had to have seen the plan, since the proof-of-work is proof that they worked on it. If the CPU power exhibited by the proof-of-work is sufficient to crack the password, they can safely attack at the agreed time.
I don't quite understand - what happens if two of them find a hash solution at the same time and both broadcast it? Then you have the same problem as before, right?
All users prefer the longest chain. In the case of two solutions being broadcast at about the same time, then each miner will probably prefer the solution they received first, and will base their solution attempts off of that chain. It is very unlikely that both chains will each have a new block broadcast at the same time. Whichever chain gets a new block first will be the winner and everyone will switch to that chain, abandoning the now-orphan chain.
The two chains will fall out of sync before they're finished. Chance of all 12 proofs being completed at exactly the same time and broadcast to groups of identical computing power every step of the way is very small.
It's simplistic to say that the problem was "solved decades ago". Every existing attempt at Byzantine fault tolerance is limited in one way or another - if you're looking for a solution it's really a matter of choosing a solution based on which set of painful limitations you want to live with.
Bitcoin's solution is a pretty good one for its use case. Most other solutions would have trouble scaling like the blockchain can, but in many applications its very long consensus lead time would be unacceptable.
JAMES: I announce my desire to go to lunch.
BRYAN: I verify that I heard that you want to go to lunch.
RICH: I also verify that I heard that you want to go to lunch.
CHRIS: YOU DO NOT WANT TO GO TO LUNCH.
JAMES: OH NO. LET ME TELL YOU AGAIN THAT I WANT TO GO TO LUNCH.
CHRIS: YOU DO NOT WANT TO GO TO LUNCH.
BRYAN: CHRIS IS FAULTY.
CHRIS: CHRIS IS NOT FAULTY.
RICH: I VERIFY THAT BRYAN SAYS THAT CHRIS IS FAULTY.
BRYAN: I VERIFY MY VERIFICATION OF MY CLAIM THAT RICH CLAIMS THAT I KNOW CHRIS.
JAMES: I AM SO HUNGRY.
CHRIS: YOU ARE NOT HUNGRY.
RICH: I DECLARE CHRIS TO BE FAULTY.
CHRIS: I DECLARE RICH TO BE FAULTY.
JAMES: I DECLARE JAMES TO BE SLIPPING INTO A DIABETIC COMA.
RICH: I have already left for the cafeteria.
Take the succession order of the commander and chief, there a strict successive order when shit happens. It's master election vs the generals problem. Personally, I find master election much easier to cognitively reason about.
Indeed, the common discovery mode for an impossibly
large buffer error is that your program seems to be
working fine, and then it tries to display a string
that should say “Hello world,” but instead it prints
“#a[5]:3!” or another syntactically correct Perl
script, and you’re like WHAT THE HOW THE, and then
you realize that your prodigal memory accesses have
been stomping around the heap like the Incredible Hulk
when asked to write an essay entitled “Smashing
Considered Harmful.”
"When it’s 3 A.M., and you’ve been debugging for 12 hours,
and you encounter a virtual static friend protected volatile
templated function pointer, you want to go into hibernation and awake as a werewolf and then find the people who wrote the C++ standard and bring ruin to the things that they love."
* Castro and Liskov, "Practical Byzantine Fault Tolerance", http://www.pmg.lcs.mit.edu/papers/osdi99.pdf As the title says, this paper describes a practical consensus algorithm that tolerates Byzantine failures. In some ways it is provably optimal.
* Lamport, "Leaderless Byzantine Paxos", http://research.microsoft.com/en-us/um/people/lamport/pubs/d... Interesting follow-on from Castro and Liskov, removing the role of the leader.
* Driscoll, "Murphy was an Optimist", http://www.rvs.uni-bielefeld.de/publications/DriscollMurphyv... These things really happen in practice.
* van Renesse et al, "Byzantine Chain Replication", http://www.cs.cornell.edu/home/rvr/newpapers/opodis2012.pdf Very fast replication in a model that allows Byzantine failures.