Thanks for explaining. Your project looks very interesting. I will be definitely following it.
Can you explain some more about how node failure are handled? And whether there is a way to tune that. I.e more about the phrase "less than f nodes fail at any give point in time".
Is that tunable and how dynamic is that. (Set once or dynamic setting? Also set per running cluster, per name space, per key-value pair?)
Also, how is this f number related to the total number of machines in the cluster and how is it related to the number of data replicas?
f is the number of failures that the system can tolerate. Typical systems can tolerate f failures with either (f+1) or (2f + 1) replicas.
HyperDex can tolerate f failures with f+1 replicas. This failure threshold is per region of the hyperspace. Within each region, servers are arranged in a chain. As servers fail they are removed from the chain. As new servers come into the cluster they are appended to the end of one or more chains. The "less than f nodes fail at any given point in time" translates to "at least one node in the chain is alive". Notice that chains are of length f+1 while only f nodes can fail, guaranteeing that one node is alive.
This will be tunable parameter. Right now, it is set when a new space is created, but in an upcoming release this will be totally dynamic and changeable on a per-region (of the hyperspace) basis.
I'd recommend f values of at most three. If failures are totally independent, this will be sufficient to last for millions of years. Thus, it is independent of the number of machines (once you have a sufficient number of machines).
Most of the decent providers will help you improve failure independence. I know first hand that Linode is very accommodating for this.
I mention independent failures because f=3 is sufficient for this scenario for most. If there is correlation between failures and you cannot move servers around then you can compensate with a higher f.
In a future release (within the next six months), we'll be adding support for consistent snapshots that guarantee that HyperDex can withstand more than f failures with a bounded amount of data loss.
It's always possible to retrieve all data with an empty search and manually dump it into another form.
Can you explain some more about how node failure are handled? And whether there is a way to tune that. I.e more about the phrase "less than f nodes fail at any give point in time".
Is that tunable and how dynamic is that. (Set once or dynamic setting? Also set per running cluster, per name space, per key-value pair?)
Also, how is this f number related to the total number of machines in the cluster and how is it related to the number of data replicas?