Hacker News new | past | comments | ask | show | jobs | submit login

Thanks for the info that's exactly what I thought the recovery process should be. Node failure isn't bad until it's a cascading catastrophe :)

But by bad I meant when a Node is based on local disks, that Cassandra and ScyllaDB usually recommends.

Depending on the time between snapshots and the restore from snapshot process (if there are even snapshots...) can be problematic.

Bootstrapping nodes from zero depending on the cluster size (Big data nodes while not recommended are pretty common) could take days in Cassandra because the streaming implements was (maybe still is?) very bad




Scylladb is luckily much faster. We can rebuild a ~5TB node in a matter of hours.


Doesn't streaming affect the node network? Isn't that an issue or do you have a dedicated NIC for intra node communications? or do you just limit the streaming bandwidth?

Thanks.


No dedicated NICs, the link speeds are fast enough to not really worry about this.

It's also worth mentioning that in a cluster of N, to recover a node, it simply needs to stream 1/(N - 1) of the data from its neighboring nodes. So when you look at the cluster as a whole, and the strain on each node that is UP and serving traffic, it's insignificant.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: