Hey, great stuff! How do you implement high availability?

wumpus · on April 25, 2012

There are 2 parts to our high availability.

First is on our frontend side. We have 2 nginx servers (using linux HA and vips) which send traffic out to the nodes of the cluster which are up, retrying to a different node in the case of failure or a slow reply.

Deeper in the system, there are 3 copies of every piece of data.

Both of these are fairly normal mechanisms; the 3-copy thing is used at Google and by Hadoop and friends.

jsrfded · on April 25, 2012

Within the datastore, there are 3 copies of each piece of data. When a get() request is made, it goes out to the "closest" copy; if an answer isn't heard from by some threshold, a 2nd request is made to one of the other replicas. Whoever gets the data back first wins.