Hacker News new | past | comments | ask | show | jobs | submit login

What happens when the number of servers changes? The cache hit rate would likely drop to zero until it warms up again, which is a good way to accidentally overload your systems.

Load balancing based on consistent hashing is the better way to implement this.




When the number of server changes you slowly ramp up from mod n to mod (n+1). Flip a biased coin for each user to decide whether to use n or n+1, slowly crank up the bias to the n+1 side.


Consistent hashing is a bit cleaner way to do it, but pretty much the same result as modulo-ing the user id against number of servers. At least as I understand it, you consistently hash something (a user id, a request URL, etc) into N buckets, where N is the number of servers, so changing N re-shuffles all of the buckets anyway.

Short of something like cassandra's ring topology, how would you use consistent hashing add new servers and assign them requests?


You are missing a crucial piece here to have consistent hashing: you also need hash the names of the servers. With consistent hashing you hash both the names of the requests and of the servers, then you assign the request to the server with closest hash (under the modulus). With this scheme, you only need to remap 1/n of the keys (where n is the number of servers).


You're kind of right. You can also use something like jump consistent hash [0] which only requires you to have a consistent ordering of the hosts where you're sending the information. We (Facebook) use something similar for our caches. It requires a linear array of hosts but you've already got that if you're load balancing.

[0] https://arxiv.org/abs/1406.2294


That makes a lot of sense, thanks.

Better consistent hashing means that existing servers don't have their caches invalidated, but the new servers that were just added start with empty caches anyway so are fielding all uncached requests. Hopefully the bottleneck is actually with some shared layer behind it (a database or something) otherwise I guess you'd need to come up with a more complex way to slowly distribute more traffic to the new nodes.


Proper consistent hashing will only move (on average) K/N assignments when going from N to N+1 servers, for K original assignments.





Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: