We’ve interacted before on Twitter and GitHub, and I want to address your point about Raft in Typesense since you mention it explicitly:
I can confidently say that Raft in Typesense is NOT broken.
We run thousands of clusters on Typesense Cloud serving close to 2 Billion searches per month, reliably.
We have airlines using us, a few national retailers with 100s of physical stores in their POS systems, logistic companies for scheduling, food delivery apps, large entertainment sites, etc - collectively these are use cases where a downtime of even an hour could cause millions of dollars in loss. And we power these reliably on Typesense Cloud, using Raft.
For an n-node cluster, the Raft protocol only guarantees auto-recovery for a failure of up to (n-1)/2 nodes. Beyond that, manual intervention is needed. This is by design to prevent a split brain situation. This not a Typesense thing, but a Raft protocol thing.
I can confidently say that Raft in Typesense is NOT broken.
We run thousands of clusters on Typesense Cloud serving close to 2 Billion searches per month, reliably.
We have airlines using us, a few national retailers with 100s of physical stores in their POS systems, logistic companies for scheduling, food delivery apps, large entertainment sites, etc - collectively these are use cases where a downtime of even an hour could cause millions of dollars in loss. And we power these reliably on Typesense Cloud, using Raft.
For an n-node cluster, the Raft protocol only guarantees auto-recovery for a failure of up to (n-1)/2 nodes. Beyond that, manual intervention is needed. This is by design to prevent a split brain situation. This not a Typesense thing, but a Raft protocol thing.