Hacker News new | past | comments | ask | show | jobs | submit login
RedisRaft (github.com/redislabs)
144 points by anhldbk on May 7, 2023 | hide | past | favorite | 50 comments



A gentle reminder that FoundationDB exists and has this nailed down really well. They are just bad at marketing, so it's not in fashion. But do check it out if you want a distributed database with strict serializable semantics, that works.


>bad at marketing

I am guessing maybe it's Apple's corporate secrecy that is the issue. Apple likely has a massive deployment of this tech.


There are some huge deployments out there. Snowflake (the database) has a big dependency on it for how they do metadata (for example)


Also, notably it backs Deno's new KV Store. https://deno.com/blog/kv#run-locally-or-managed


Exactly. I saw that thing tucked away in the diagram for Snowflake when surveying the so-called lakes - by all accounts FDB is an excellent piece of tech.


Strongly consistent FoundationDB = Likely similar write performance to CockroachDB or TiDB when you avoid secondary indexes.

Secondary indexes in "distributed strongly consistent" systems is what ruins performance: because each index is +1 write to another "index table" (... +1 raft quorum check!).

I don't think FoundationDB has "secondary indexes" to begin with, so one may never run into the +1 write per index issue.. it's just a layer on top of TiKV (equivalent to RocksDB in CockroachDB).


I am not sure if I am quite getting your point here, but if we're talking about indexes, I've just implemented indexing using FoundationDB and writing to the index happens in the same transaction as the main data write (not really sure why it would ever be otherwise). Definitely not "+1 raft quorum check".

Calling FoundationDB "just a layer on top of TiKV" is… well… :-)


I can't speak to FoundationDB specifically but in general, why would secondary indexes require a second quorum check?

Can't secondary indexes just be implemented in your state machine?

I.e. your state machine handles the insert and also writes a secondary index at the same time. Every state machine could do this identically off the log.


> Can't secondary indexes just be implemented in your state machine?

Secondary indexes are implemented as an additional smaller table in most databases (Index -> Primary Key), which requires a write, too.

Don't take my word for it- run some benchmarks yourself to see the performance cliffs appear in these "distributed strongly consistent" databases when you have a handful of indexes.

It's why many require 10x the hardware to scale past the performance of a single node postgres / mysql.

> Calling FoundationDB "just a layer on top of TiKV" is… well… :-)

FoundationDB recommends running TiDB ontop to get an SQL layer...


>FoundationDB recommends running TiDB ontop to get an SQL layer...

You might be mixing up projects here - TiDB/TiKV are completely separate projects from FoundationDB. I don't think FoundationDB has a supported SQL layer.


> Secondary indexes are implemented as an additional smaller table in most databases (Index -> Primary Key), which requires a write, too.

> Don't take my word for it- run some benchmarks yourself to see the performance cliffs appear in these "distributed strongly consistent" databases when you have a handful of indexes.

I'm not saying secondary indexes aren't a performance cliff. I'm saying I can't think of a reason they must require additional quorum on top of the initial message that would require a secondary index update.

Since you said:

> Secondary indexes in "distributed strongly consistent" systems is what ruins performance: because each index is +1 write to another "index table" (... +1 raft quorum check!).


They will be partitioned by a different key. You may in fact have to touch two extra quorums - one to delete the old entry and one to insert the new entry.

This depends on the specifics of how your database works, of course. I’m not sure this would involve multiple consensus quorums in FDB for instance, given that I believe it instead relies on a centralised timestamp service.


I don’t think there is any officially recommended way to run SQL ontop of FoundationDB. I suspect the TiDB comment you saw was someone speculating about an idea maybe (I vaguely recall some GitHub issues about it).

There are some well supported document interfaces like the official record layer or Tigris though.


You're right, my mistake.


I would love to see a Jepsen test of this when it's ready. The Redis Cluster evaluation [1] was a great read.

[1]: https://aphyr.com/posts/283-jepsen-redis


FYI jepsen already evaluated the development build of Redis Raft and the write up was a good one https://jepsen.io/analyses/redis-raft-1b3fbf6

Will indeed be interesting to see the analysis once it becomes stable.


“What would Kyle have to say about this ?” was the first thought that came to my mind.


Linking to the introduction bypasses the prominent note in the readme:

> RedisRaft is still being developed and is not yet ready for any real production use. Please do not use it for any mission critical purpose at this time.


This is essentially a complete fabrication. If it's on public repo, ready or not, someone will use it for prod.


It's not a fabrication it's a statement that if you use this in prod and it catches fire, that's on you buddy.

This is like saying laws around seatbelt wearing are a fabrication because some people ignore them..


Fabrication means lie. What are they lying about?


That it's not ready for prod when it's a public repo in GitHub


Why does the code being public mean it is ready for prod? I feel you have a large misunderstanding.

Code being ready for production has nothing to do with code being shared privately nor publicly.


Why choose this over etcd? Especially if it's a limitation / non-goal to support all Redis commands, or to respond with Redis-like quick performance? Why not go with the battle-hardened (it's the backing datastore in Kubernetes), proven option?


I am not sure neither. But this might overcome the etcd's soft storage limit of 8GB? [1]

[1] https://github.com/etcd-io/etcd/issues/9771


I've been watching this project for a long time. It was supposed to be released with Redis 7[1]. But I guess this is not true anymore. And there is no public roadmap saying when it will be production ready.

[1] https://www.zdnet.com/article/redis-labs-unveils-redis-datab...


I made my own distributed JSON over HTTP database back in 2016.

It has been running in a intercontinental production environment with 100% read uptime since 2017.

It's 2000 lines of code: http://root.rupy.se (this test env. has 3 nodes: fem, six and sju)


2000 lines of Java. Do performance tests show worst-case latency impact of GC pauses?


Worst is 0.7 seconds for save and 0.1 for load it seems.

Average is 4.8 millisec and 0.5 millisec respectively.

But those are just the JVM doing it's thing.

The numbers I like are these: 190.2 158 331

200ms global save average, 158ms min and 331ms max from europe to east+west US and Asia. Without fault, so consistent, much of that can be attributed to AWS improving so much over the years.

As to why the load/save are slower it's because the complete global roundtrip stats I only have for registers which are rare now so they don't hit the GC I'm guessing.

The thing I'm most proud of is async-to-async meaning the system will saturate all cores (without io-wait) on all machines without problems... it just keeps solving the problem at 100% efficiency, no memory leaks and 5 years uptime without any crashes.

Slowing down is the worst case, and if that is a problem just upgrade instance type, no uptime then though.


I did that once by hosting json files on a webserver in a different country.


> A cluster may lose up to (N/2)-1 nodes

What a weird notation. When N=3, a cluster may lose up to 1 node, I don't know how that matches this formula.


tweaked the language a bit. thanks for pointing it out.


I am looking at KeyDB and consider to use it as replacement of Redis. Besides some speed improvements it has good looking replication and cluster solutions. https://docs.keydb.dev/docs/cluster-spec


We thought the same and deployed KeyDB to production as a replacement for big Redis deployment (200+ GB memory) and we ran into many unpleasent issues with it - very high replication latency, instability, random crashes, memory leaks, etc. So I'd advise you to do thorough testing before you use it in production.


We start tests in the coming week. Current memory of Redis use is about 70gb. Thanks a lot for your comment. I hope to create a stable KeyDB environment as it would solve some of our problems we have with Redis replication. The issues you describe sound scary.


Would you be so kind as to share the results of your tests, please?


It may have improved, but KeyDB has a number of issues for common Redis use cases e.g. if you're using Redis as task queue (typically BRPOP) you'll encounter a race condition in which each KeyDB instance will make a new task available on all nodes for listening workers resulting in duplication of tasks.


I attempted to use KeyDB precisely for its replication and clustering, but was forced to switch to Redis HA. Too many issues getting it to work in a stable way.


But raft isn’t strongly consistent, it has known liveness issues.

https://decentralizedthoughts.github.io/2020-12-12-raft-live...


What does strongly consistent have to do with liveness? If there's a connection it seems pretty indirect.


The article you linked to says if you have PreVote and CheckQuorum it then doesn’t have liveness issues.


That is about availability in the CAP theorem, not consistency though.


Raft is a pretty decent -- not great -- consensus algorithm (IMHO) but it is used because it is easy to understand. If I had to trust one, I would probably go with Multi-Paxos, if you could successfully implement it.


AWS’ new MemoryDB also seems to be a strongly consistent Redis cluster service. Anyone know how they compare?

https://aws.amazon.com/memorydb/features/


MemoryDB has a single node (primary node) strong consistency.

MemoryDB seems to have a very similar architecture to that of AWS Aurora, which separates a storage layer and compute nodes and consistency is implemented not by communicating between compute nodes but by offering a consistent distributed storage layer. This architecture usually don't have a multi-node strong consistency by itself and can have replicas.

This means that in MemoryDB only the primary node is strongly consistent but the replica nodes don't.

Instead, in my experience, those kinds of AWS offerings have less operational headaches because the storage remains safe even the primary node fails and you don't need to worry about managing distributed nodes.

Edit: add pros


My understanding of MemoryDB is that it basically replaces the AOF with a distributed log (it might be Kafka/kinesis, but it could just be backed by the same data layer as aurora). The biggest win there is that acknowledged writes are not lost if the writer node dies. A reader can replay the log and get fully caught up as it is promoted during a failover.

This comes at a cost though and writes are slower than traditional redis.


Game changer if you can turn raft checks on/off on a per-query basis, like scylladb / cassandra.


Please elaborate.


Cassandra (and ScyllaDB which is the same data model) allow for customizable consistency level on a per-query basis. You can send 1 write with only one node confirming while sending another requiring full cluster acknowledgement.

More details: https://docs.datastax.com/en/cassandra-oss/3.0/cassandra/dml...


Does this have any benefit over Mnesia?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: