KeyDB – A Multithreaded Fork of Redis

dig1 · on May 21, 2023

Without the intention to undermine anyone's work (and I truly appreciate the work the KeyDB guys did), I would not use this (in production) unless I see Jepsen test results [1] either by Kyle or someone supervised by Kyle. But, unfortunately, databases are complex enough and distributed systems even more, and I've lost many sleepless nights debugging obscure alternatives that claimed to be faster in edge cases. And the worst thing is that these projects get abandoned by the original authors after a few years, leaving me with (to quote a movie) "a big bag of odor".

I see databases as a programming languages - if they have a proven track record of frequent releases and responsive authors after 5-7 years, they are usable for wider adoption.

[1] https://jepsen.io/analyses

redm · on May 21, 2023

To counter what the other active business said, we tried using KeyDB for about 6 months and every fear you concern you stated came true. Numerous outages, obscure bugs, etc. Its not that the devs aren’t good, its just a complex problem and they went wide with a variety of enhancements. We changed client architecture to work better with tradition Redis. Combined with with recent Redis updates, its rock solid and back to being an after-thought rather than a pain point. Its only worth the high price if it solves problems without creating worse ones. I wish those guys luck but I wont try it again anytime soon.

* its been around 2 years since our last use as a paying customer. YMMV.

derefr · on May 21, 2023

Actively using KeyDB at work. I would normally agree with you; and for a long time after I initially heard about the project, we hesitated to use it. But the complexity and headaches of managing a redis-cluster, uh, cluster (on a single machine — which is the idiomatic way to vertically scale Redis on a single multicore machine...) eventually pushed us over the edge into a proof-of-concept KeyDB deployment.

Conveniently, we don't have to worry about the consistency properties of KeyDB, because we're not using KeyDB distributed, and never plan to do so. We were only using redis-cluster (again, on a single machine) to take advantage of all cores on a multicore machine, and to avoid long-running commands head-of-line blocking all other requests. KeyDB replaces that complex setup with one that's just a single process — and one that doesn't require clients to understand sharding, or create any barriers to using multi-key commands.

When you think about it, a single process on a single machine is a pretty "reliable backplane" as far as CAP theory goes. Its MVCC may very well break under Jepsen — hasn't been tested — but it'd be pretty hard to trigger that break, given the atomicity of the process. Whereas even if redis-cluster has a perfect Jepsen score, in practice the fact that it operates as many nodes communicating over sockets — and the fact that Redis is canonically a memory store, rather than canonically durable — means that redis-cluster can get into data-losing situations in practice for all sorts of silly reasons, like one node on the box getting OOMed, restarting, and finding that it now doesn't have enough memory to reload its AOF file.

exabrial · on May 21, 2023

If I'm trying to just burn through VC money, and scalability is our #1 problem with our 142 users, would this be a good choice then?

survirtual · on May 21, 2023

How is scalability an issue with 142 users? I am genuinely curious.

Scaling starts being a real issue with 10,000+ users. Pretty straight forward to write a server on rust with a single machine capable of handling around 5,000 users, assuming stateless requests.

Maybe you were making a joke and I missed it.

exabrial · on May 21, 2023

It’s a joke. The parent makes a really solid point and it’s advice you should follow. A boring tech stack is the correct one.

It’s a pretty well documented fact SV startups tend to spend a lot of money on over engineering and making technology decisions based on what’s flash-in-pan popular rather than longevity. Any delta on stability you simply make up with sweat and don’t tell anybody about.

Most startups ideas could be fully implemented in cgi-bin gateway scripts in a few days, but that’s not ‘sexy’. Part of it is a mating dance to VCs: the more hip and bleeding edge you seem, the more competent you appear; despite the inverse is an actual reflection of reality. So my comment is in response to that running joke; in a way, a new unstable database that could disappear off the Internet within six months is a great technology to bass your entire start up on, given the above context.

survirtual · on May 30, 2023

Okay! Thanks for explaining to the benefit of all knuckleheads (like me).

octernion · on May 21, 2023

yeah man it's a real subtle joke. 142 users surely means scalability is the #1 priority.

bhouston · on May 21, 2023

It is a Snapchat project though so it is likely at least somewhat decent compared to a random unfunded project. Maybe snap could pay Jensen to review it?

VWWHFSfQ · on May 21, 2023

I think I'll stay far away from this thing anyway. Numerous show-stopper bug reports open and there hasn't been a substantial commit on the main branch in at least a few weeks, and possibly months. I'll be surprised if Snap is actually paying anybody to work on this.

berkle4455 · on May 21, 2023

[flagged]

wcarron · on May 21, 2023

That's so ridiculously tone deaf, self-righteous, and demeaning. Lots of talented devs work in thousands of companies.

Learn some humility and perspective.

T0pH4t · on May 21, 2023

fwiw, the company I work for has used keydb (multiple clusters) in production for years, under non-trivial load (e.g. millions of req per second). It did serve as a replacement to actual redis clusters, there were major improvements seen by the switch. I can't remember the actual gains as it was so long ago. I do remember it was an actual drop in replacement, as simple as replacing the redis binary and restarting the service. So if you can afford to, maybe try replacing a redis instance or two and see how it goes.

gigatexal · on May 21, 2023

This should be the default mode or thinking for anything that stores your data. Databases need to be bullet proof. And that armor for me is successful Jepsen tests.

hotstickyballs · on May 21, 2023

Given how much of a typical code base does depend on databases I think that’s a good perspective.

29athrowaway · on May 21, 2023

If you only used the versions of databases tested by Jepsen you would have problems worse than data loss, such as security vulnerabilities, because some tests are years old.

Then, has someone independently verified the Jepsen testing framework? https://github.com/jepsen-io/jepsen/

jakewins · on May 21, 2023

This doesn’t seem to be what was suggested? Using an up-to-date database that has had its distributed mechanisms tested - even if the test was a few versions back - is a lot better than something that uses completely untested bespoke algos.

As for verifying Jepsen, I’m not entirely sure what you mean? It’s a non-deterministic test suite and reports the infractions it finds; the infractions found are obviously correct to anyone in the industry that works on this stuff.

Passing a Jepsen test doesn’t prove your system is safe, and nobody involved with Jepsen has claimed that anywhere I’ve seen.

29athrowaway · on May 21, 2023

As a fork, it is essentially a new version of an already Jepsen-tested system. That meets your definition of "a lot better" than "untested bespoke algos".

jakewins · on May 21, 2023

I can’t tell if you are trolling me, but obviously “we made this single-threaded thing multithreaded” implies new algorithms that need new test runs

29athrowaway · on May 21, 2023

I am just evaluating your logic in the form of a reply, so you can see it at work. The same issue you describe happens often across multiple versions of a system.

nemothekid · on May 21, 2023

The part of Redis that were Jepsen stressed, Redis-raft more recently, and Redis sentinel, for which the numerous flaws were pretty much summed at "as designed". No part of KeyDB has gone through a Jepsen style audit, all of which are untested bespoke algos.

bastawhiz · on May 21, 2023

Chrome is based on a fork of Webkit. Are you going to trust it because Safari passed a review before the fork?

29athrowaway · on May 22, 2023

Ask one level higher in the thread.

bastawhiz · on May 21, 2023

> Then, has someone independently verified the Jepsen testing framework?

I don't think it matters. Jepsen finds problems. Lots of them. It's not intended to find all the problems. But it puts the databases it tests through a real beating by exercising cases that can happen, but are perhaps unlikely (or unlikely until you've been running the thing in production for quite a while). Having an independent review does nothing, practically, to make the results of the tests better.

In fact, almost nothing gets a perfectly clean Jepsen report. Moreover, many of the problems that are found get fixed before the report goes out. The whole point is that you can see how bad the problems are and judge for yourself whether the people maintaining the project are playing fast and loose or thinking rigorously about their software. There simply isn't a "yeah this project is good to go" rubber stamp. Jepsen isn't Consumer Reports.

29athrowaway · on May 22, 2023

Have you ever been in a situation when you are writing tests where you get a test failure and realized that your test assertion was wrong?

bastawhiz · on May 25, 2023

Jepsen is more like a fuzzer than a unit test suite. It outputs "I did this and I got back this unexpected result". All of those outputs need to be analyzed by hand.

You don't audit a fuzzer to say "what if it runs the thing wrong". That's not the point of the fuzzer. The point is to do lots of weird stuff and check that the output of the system matches the expectation of what's produced. If the fuzzer outputs a result that's actually expected, then that's easily determined because you have to critically analyze what comes out of the tool in the first place.

ibotty · on May 23, 2023

It doesn't really matter if the Jensen test suite is faulty. If it reports something and it is (against all odds) not a valid bug. What's the problem. It does not claim to find all problems (and can't) so this is sufficient.

didgetmaster · on May 21, 2023

I certainly understand this sentiment. I am building a general-purpose data manager that has many relational database features. It can do a number of things better and much faster than conventional databases. It is currently in beta and available for free download.

But I would be shocked (and worried) if someone tried to use it as their primary database in production. It just doesn't have enough testing yet and is still missing some features.

Instead, I am promoting it as a tool to do tasks like data analysis and data cleaning. That way it gets a good workout without causing major problems if there is a bug.

https://didgets.com/

gizzlon · on May 21, 2023

https://github.com/Snapchat/KeyDB/issues?q=is%3Aopen+is%3Ais... :O

secondcoming · on May 21, 2023

We run into the following one quite regularly:

https://github.com/Snapchat/KeyDB/issues/465

Twirrim · on May 21, 2023

Filed July, eventually marked priority 1 in early December, not a single comment or signs of fix on it since. That doesn't look good at all.

yard2010 · on May 21, 2023

So many open bugs and each one of them sounds worse than the other... Use after free.. I'll wait for the Rust clone ;)

cies · on May 21, 2023

Good point. I remember Redis is designed as single thread as it makes the design simpler.

Now with Rust we can actually manage complexity from multiple threads (if that's even still needed when using an async/evented/eventloop/io_uring-based architecture).

alphanullmeric · on May 21, 2023

[flagged]

henry700 · on May 21, 2023

I can't hear you over the literal backbone of the internet running in Rust inside Cloudflare right now

alphanullmeric · on May 21, 2023

Funny since their GitHub has more Go and C++ than rust, yet they get no credit when cloudflare is boasted as a rust shop by loud rustaceans.

On that topic, go (which is of similar age to rust) is the backbone of docker and kubernetes. C++ is the backbone of unreal engine, Google search, HFT, nvidia, etc. For everything that rust is used, there are a dozen others written in languages with less annoying fans, languages that don’t force you to do [manual name mangling](https://docs.rs/serde/latest/serde/trait.Deserializer.html) or [copy pasting](https://github.com/nautechsystems/nautilus_trader/blob/maste...) and macro nonsense to compensate for poor ergonomics. Turns out “rewrite it you’re doing it wrong” and “<convenient feature from other language> is an antipattern” is not a good solution when real money is on the line. Perhaps rustaceans should stick to what they’re good at (GitHub surveys and spamming threads discussing unrelated topics).

misnome · on May 21, 2023

[Crash]

Piezoid · on May 21, 2023

44 strn?cpy, 287 strlen; SQLite: 20 strn?cpy, 182 strlen in 1.4x C slocs.

mgaunard · on May 21, 2023

Whenever I see someone arguing their code is better because it's multithreaded, I cringe.

Most developers cannot do multithreading correctly, and unless you're particularly good about it it's just going to introduce not only lots of bugs but also performance problems.

The only folks in that space that seem to do it well are ScyllaDB.

intelVISA · on May 21, 2023

Multi-threading easy is it's hard synchronization that's

porsager · on May 21, 2023

Multi-threading is easy it's synchronization that's hard.

gnrlst · on May 21, 2023

I see there what did you.

Alifatisk · on May 21, 2023

Oh, now I got it!

didgetmaster · on May 21, 2023

I have been writing multithreaded code for decades and it is hard. My current database engine can break a single query into tasks to be run in parallel for much faster performance; but finding and fixing bugs is a real challenge.

All it takes is one critical section to not be protected (i.e. locked) to cause a bug. A series of tests can run hundreds of times correctly without detecting the problem. It is only when a context switch happens at a certain microsecond that the error is exposed.

I am a true believer in multithreading as my own code can see tremendous performance gains using it on the latest multi-core CPUs; but tread very carefully when programming in this manner.

ddorian43 · on May 21, 2023

ScyllaDB doesn't do multithreading. It has 1 thread per core.

mgaunard · on May 21, 2023

That's the good way to do it.

And sorry, but that is multithreading, there are several cores.

kykeonaut · on May 21, 2023

Yep, multiple C++ threads each of which is allocated to a single core.

thekozmo · on May 21, 2023

That's not true. Scylla does multi-threading. Scylla is a single process, single address space. It does pin the threads to individual hyper threads but there are additional other workers in the background as well.

ddorian43 · on May 22, 2023

The hot path is message-passing.

Native multi-threading is used when you have functionality that already works on threads and you don't want to port it.

Multi-thread is not used in the hot path.

A single data-part/shard is served by a single thread.

akoncius · on May 21, 2023

not do dismiss anything or what..

but how does this kind of multithreading (one thread per core) is better than proper multithreading (many threads per core)?

denotational · on May 21, 2023

Perhaps you’re thinking about SMT (e.g. Intel Hyper Threading) when you say “proper multithreading”?

I’m not sure it’s valid to say that only SMT is “proper multithreading”, especially since multithreading as a concept predates it by quite a way.

SMT has a quite a few performance issues since resources such as the L1, L2, and branch predictor are shared between the threads, which can lead to contention that hurts the performance of all the SMT threads sharing a physical core.

SMP is no less “proper”, and as core counts have increased significantly on commodity CPUs, the use of spinning threads bound to a single core each has become a common paradigm.

Oversubscription without SMT (i.e. many threads per core) is possible, but unless you have a workload where each thread is I/O bound with a substantial amount of time spent blocking, the overhead of scheduling and context switching means throughput will likely decrease.

mgaunard · on May 21, 2023

All SMT does is allow multiple instruction counters on the same superscalar core. It increases utilization of all compute units and therefore increases throughput.

Of course it increases latency, since those resources are not fully exclusive to a particular thread anymore.

Whether or not it's a good thing depends on what you care about. You could also argue that a good program would be able to saturate a single superscalar core with a single thread and thus wouldn't benefit from SMT at all, but I think that would be hard to guarantee in practice.

denotational · on May 21, 2023

I don’t disagree, I’m just trying to unpack what the GP might have meant by “proper multithreading”.

mgaunard · on May 21, 2023

Doing multiple threads per core is not "proper multithreading", since it's a well-known antipattern.

kristiandupont · on May 21, 2023

..it's not good because it's bad?

Why is it an anti pattern, this is news to me?

janhaa · on May 21, 2023

More threads on a single core equal more context switches, which reduces the effective amount of instructions you can process.

kristiandupont · on May 21, 2023

Well sure, but why would that make it "improper multithreading"? Is polymorphism based on vtables not "proper OOP"? We rely on many abstractions that aren't free in terms of CPU cycles because it makes development easier or less error prone.

And setting up, say, one thread per HTTP request will likely be negligible because blocking I/O is where time is spent anyways..

threeseed · on May 21, 2023

It's not improper it's just non-optimal.

And we have had non-blocking I/O for quite some time now.

mgaunard · on May 21, 2023

Any networking program doing blocking I/O is doing it wrong.

Your I/O should only be done synchronously if it's non-blocking.

Now for disk I/O, it's a more muddy thing, it's actually quite different from networking since it's more transparently managed by the operating system.

mgaunard · on May 21, 2023

Kernel threads do not scale, and neither does the scheduler.

Userland threads (or fibers, or stackful coroutines) do scale better though.

why_only_15 · on May 21, 2023

My limited understanding is that there's less cache thrashing (from multiple different workloads scheduled on the same core) and less scheduler overhead (from less overall threads).

denotational · on May 21, 2023

Just to add that scheduling overhead goes away with SMT (assuming you don’t oversubscribe), but the sharing of caches and branch prediction logic is still an issue as you point out.

bdhcuidbebe · on May 21, 2023

You want databases to be boring. Stick to tried and true stuff so you can focus on your product rather than beta testing fortune-500 crap.

notmypenguin · on May 21, 2023

A) key-value is MUCH simpler to implement than an RDBMS, code wise. It’s arguably more boring and less theoretical than indices on PG or foreign key constraints

B) there’s a lot of tricky stuff with indices on PG and you generally need a DB admin from day 1

C) your comment is probably more appropriate for either layered databases or new fangled stuff like time series or graph db’s etc

ploppyploppy · on May 21, 2023

> B) there’s a lot of tricky stuff with indices on PG and you generally need a DB admin from day 1

Sorry this is absolute nonsense. Any software engineer worth their salary should be comfortable working with RDBMS index concepts and interrogating their relational model to determine best practice and direction for table indexing.

bdhcuidbebe · on May 21, 2023

In this context I mean state of the art db:s like redis or memcached.

silverwind · on May 21, 2023

KeyDB is useful for it's Multi-Master mode, but other than that, I can not recommend it because of how many serious bugs it has.

avinassh · on May 21, 2023

how does it handle conflicts in multi-master mode?

flagrant_taco · on May 21, 2023

I believe it includes A Thunderdome protocol. Two matters ever, one master leaves.

efficax · on May 21, 2023

redis being single threaded is actually a feature since you have guarantees about consistency and serializability

andrewstuart · on May 21, 2023

I’m surprised some Rust enthusiasts have not cloned Redis.

erk__ · on May 21, 2023

There are things like Skytable which have the same usecases as Redis to my understanding, though it is not compatible with redis.

https://github.com/skytable/skytable

andersrs · on May 21, 2023

https://www.youtube.com/watch?v=s19G6n0UjsM

revskill · on May 21, 2023

Because the original Redis is the fastest already ?

andrewstuart · on May 21, 2023

Redis is single threaded, in a world in which 16 cores/32 threads is affordable.

dijit · on May 21, 2023

If your goal is to fully utilise the hardware you have then doing some deterministic sharding on top will likely be good enough.

Honestly, redis makes pretty sane tradeoffs, it's not worth the added complexity to add multi-threading as it would almost certainly slow it down while it does locking, and redis isn't typically CPU bound (except potentially the LUA stuff, but that's up to the user), so being multi-threaded doesn't really help much.

If you need multi-threading, there are other solutions, but of course they are slower w.r.t latency and throughput, since that's the trade-off.

avinassh · on May 21, 2023

Have you checked DragonFly? It's multi threaded and modern Redis, it blows Redis away in performance by an order of magnitude.

Some performance number here - https://www.dragonflydb.io/blog/scaling-performance-redis-vs...

capableweb · on May 21, 2023

As always, "performance" is not just one metric you can measure and it'll "blow" away the competition in all use cases.

For example, if you care most about latency, Redis is still the way to go, while DragonFly seems better at throughput. But, tradeoffs vs tradeoffs and all that yadda yadda.

avinassh · on May 21, 2023

> For example, if you care most about latency, Redis is still the way to go, while DragonFly seems better at throughput. But, tradeoffs vs tradeoffs and all that yadda yadda.

DragonFly is better at latency too. The latency numbers they are showing are measured at the high throughput. If you were to reduce the throughput, the latency number would be even better. From the same post:

> This graph shows that the P99 latency of Dragonfly is only slightly higher than that of Redis, despite Dragonfly’s massive throughput increase – it's worth noting that if we were to reduce Dragonfly's throughput to match that of Redis, Dragonfly would have much lower P99 latency than Redis. This means that Dragonfly will give you significant improvements to your application performance.

Litost · on May 22, 2023

There's an ongoing thread about Dragonfly here - https://news.ycombinator.com/item?id=36018221

tomjen3 · on May 21, 2023

Sure. But how much would Reddis benefit from the extra cores? By adding threads support it would need to add support for locks and then that might make it slower. Besides, Reddis isn't usually the slow part of what you are doing.

It might be interesting to have, say, a readonly slave builtin as a second threat that might return outdated information, but I doubt how much use you would get out of it.

I am kinda struggling to come up with a scenario where a significant part of the computational need of you app was in Reddis.

RhodesianHunter · on May 21, 2023

It's a key value store, locks can be done with bucketing fairly trivially like any modern concurrent hashmap and locking would only slow you down if you were frequent writing to the same bucket.

mperham · on May 21, 2023

Redis has MULTI transactions and many applications depend on that functionality. You can't add multiple reader threads without a hugely complex locking system to prevent things like uncommitted reads. This is the tradeoff Redis made when deciding to stay single threaded at its core.

eternalban · on May 21, 2023

Not clear how a rust "clone" fits into this. I wonder if people have tried pinning instances to cores and treating the ensemble as a host device bounded distributed variant. (just musing, haven't looked at Redis for ages.)

secondcoming · on May 21, 2023

We do this as a poor man's cluster... a python script starts a redis-server process on each core and the 'master' process has a key that lets clients know about the other processes running on the machine.

It only really works well if the client can shard the redis command to the right process itself.

colesantiago · on May 21, 2023

There is also DragonflyDB which claims to be faster than both Redis and KeyDB.

https://www.dragonflydb.io/

Kiro · on May 21, 2023

This comment made me think Dragonfly is a much better choice:

"We use keydb at work, and I absolutely do NOT recommend it due to its extreme instability, in fact we're currently in the process of switching to dragonfly precisely due to keydb's instability."

https://news.ycombinator.com/item?id=35990897

lopkeny12ko · on May 21, 2023

We evaluated DragonflyDB for Memcache. It was repeatably orders of magnitude slower under default configurations than original Memcache, using their own benchmark setup.

Either they didn't even test their own product, lied entirely about the performance, or got the marketing department to write the copy without any input from the development department.

romange · on May 21, 2023

I personally benchmarked Dragonfly vs Memcached. Are you calling me a liar? :)

Do you think I also photoshopped this document? https://github.com/dragonflydb/dragonfly/blob/master/docs/me...

lopkeny12ko · on May 21, 2023

I assume you are the lead developer or someone in an exeuctive position associated with Dragonfly. Your defensive, holier-than-thou attitude and tone here and elsewhere is another reason we decided not to adopt Dragonfly internally.

Yes, your results are either inaccurate or deceptive at best. I challenge you run to memcached, under all default settings, and Dragonfly, under all default settings, and memtier_benchmark, under all default settings. Performance is reproducibly orders of magnitude slower, and Dragonfly is also much less efficient--consuming more than double the CPU usage for the same workload.

We also created a test Dragonfly cluster mirroring a small percentage of production traffic in order to do a side-by-side comparison with Memcache. Dragonfly consumed 47% higher CPU usage and regressed P99 latency by 22%. Perhaps our workload is unique, but claiming Dragonfly outperforms Memcache the way you do in your marketing material is an outright lie.

romange · on May 21, 2023

I apologize. I must say that my tone, as you rightly wrote, was inappropriate. Indeed, I am the lead developer for Dragonfly. As such, I am deeply concerned with the performance aspects of our product. Dragonfly claims to be a drop-in, better performant replacement for Redis and Memcached. Every test & benchmark we've run on multiple cpus reinforced that. I've never faked or tweaked any of these benchmarks. That is, of course, not an excuse, and is why I opened by apologizing. I'd like to take this opportunity, if I can kindly ask so, to learn what made your results differ so much from ours. I'll personally try to reproduce what you described. If you could also reach out to me, I'd be happy to learn more about the environments in which you've conducted the aforementioned tests.

bionsystem · on May 21, 2023

Are they linked in any way with DragonflyBSD ?

fweimer · on May 21, 2023

Unlikely, it's not even open source: https://github.com/dragonflydb/dragonfly/blob/main/LICENSE.m...

Buttons840 · on May 21, 2023

I've never seen this license. It looks like it will be open source under the Apache license after 5 years though. That is, I can save the code now, and in 5 years I can do whatever I want with it (under the Apache license terms). When open-source isn't an option, then this is the next best thing.

solarkraft · on May 21, 2023

Huh, interesting concept. It's definitely a big step above "source available". It even kind of allows maintaining a community version with outside patches, albeit slowly.

It kind of rivals the KDE/Qt deal of "freely licensed when the company goes under" in its effects of the code eventually being community-maintainable once the company doesn't care for it anymore.

5 years is a bit much though.

rollcat · on May 21, 2023

It is open source, you can browse the source all you like.

It's not free software.

BSL-style licenses seem to be a popular choice for databases, thanks to AWS.

erk__ · on May 21, 2023

Well "source available" would probably be better. It is not open source by the definitions most follow in this case the OSI definition. Eg it goes against §6 of their definition. https://opensource.org/osd/

Un1corn · on May 21, 2023

Please don't try to delude people by changing the definition of open source. While sadly Open Source Initiative were not able to get the trademark for open source, the de-facto definition of open source is practically the same as free software.

Dragonfly is source available which is a completely different thing.

rollcat · on May 21, 2023

> Please don't try to delude people by changing the definition of open source.

Don't blame it on me, that ship has sailed over two decades ago. That's why RMS didn't like the term in the first place. Even if I disagree with RMS on most things, I have to admit I'm 100% with him on this one. It's almost as if the term was coined to create this kind of confusion.

In my opinion, the mental gymnastics around the definition of "open source" led to abominations like CDDL, which was carefully and explicitly designed to make it impossible/impractical/illegal to properly integrate ZFS or DTrace with Linux. CDDL is perfectly "open source" by definition, but its primary purpose was to lock people out of actually using software licensed under it, unless they happen to be running Solaris.

In all this mess, I actually think BSL is cool. It's a legally binding vow to actually make a particular release free (as in freedom) down the line. They could have kept it proprietary (which I think is totally fair), or made vague promises instead.

jen20 · on May 21, 2023

> but its primary purpose was to lock people out of actually using software licensed under it, unless they happen to be running Solaris.

And yet here we are, with DTrace (CDDL) shipping in macOS, ZFS having shipped in OS X for several releases, and FreeBSD shipping both. Even Windows (on the "insider" builds) has DTrace [1] _shipped by Microsoft_.

That makes any argument that you can't use any of this stuff unless using Solaris looking rather... wrong - and the idea that Sun lawyers would have overlooked FreeBSD, macOS or Windows if the goal were to restrict the software to be used in Solaris is laughable.

In the case of CDDL specifically, even RMS [2] refers to it as a "free software license", though not one which is GPL-compatible.

[1]: https://learn.microsoft.com/en-us/windows-hardware/drivers/d...

[2]: https://www.gnu.org/licenses/license-list.en.html#CDDL

rollcat · on May 21, 2023

> And yet here we are, with DTrace (CDDL) shipping in macOS, ZFS having shipped in OS X for several releases, and FreeBSD shipping both. Even Windows (on the "insider" builds) has DTrace [1] _shipped by Microsoft_.

That's why I personally strongly prefer BSD systems (OpenBSD in particular) and permissively-licensed software.

> That makes any argument that you can't use any of this stuff unless using Solaris looking rather... wrong

The intent was to lock out Linux specifically, otherwise they would've used a more restrictive license.

> [...] and the idea that Sun lawyers would have overlooked [...]

You're not violating the CDDL by linking it with GPL-licensed software, you're violating the GPL. Which goes to show just how devious that move was: even if Sun went belly up with no lawyers left to lift a finger, relicensing Linux with a CDDL linking exception would still be a massive clusterfuck. So Ubuntu & whoever else is shipping zfs.ko is risking getting sued by any of the half a million people who have their code in the kernel.

> In the case of CDDL specifically, even RMS [2] refers to it as a "free software license", though not one which is GPL-compatible.

You can also license your software even more permissively, but hold a patent on it, and not grant a patent license to your users. It would technically be free, but still released with an intent of restricting the freedom of certain users.

bcantrill · on May 22, 2023

As has been discussed many times on HN before[0], your read of history here is just wrong: we at Sun certainly did not think that Linux would let their own read of the GPL prevent them from integrating DTrace. More generally, other faults aside, Sun was emphatically not "devious"; as I have quipped in the past, one of Sun's greatest strengths was that it was insufficiently organized to be evil.

[0] https://news.ycombinator.com/item?id=11176361

rollcat · on May 23, 2023

> [...] one of Sun's greatest strengths was that it was insufficiently organized to be evil.

That gave me a good laugh. Fantastic bit of insight. I will have to study this case further, thank you for the enlightenment. <3

xyzzy_plugh · on May 21, 2023

That's just like, your opinion, man.

I think the parallels to free software are markedly correct. They're just words after all. It will forever be used in ways incompatible with the OSI definition, showing up after every misuse to correct folks isn't helpful.

You meant free as in beer, right?

rafaelturk · on May 21, 2023

Looks really cool, but What is the goal? Redis already something instantly fast, and single thread brings a lot of benefits. Also Redis is multithreaded in a few smart things (sync to disk is a separated thread)

blablablub · on May 22, 2023

For people who like multitreaded forks, you can also have a look at tendis and kvrocks https://github.com/Tencent/Tendis https://github.com/apache/incubator-kvrocks

teaearlgraycold · on May 21, 2023

Anyone using an obscure database in production is about to learn quite a few lessons they really need to learn.

brunoluiz · on May 21, 2023

Out of curiosity: has anyone found or know the list of trade-offs? There is not free lunch, and the single-threaded model gives certainly consistency guarantees… but what else?

joshxyz · on May 21, 2023

how does this compare to dragonfly which is also redis alternative?

secondcoming · on May 21, 2023

Looks like the person behind KeyDB has lost interest, but I was just made aware of dragonflydb.io.

ivolimmen · on May 22, 2023

So how does this compare to: edis, dragonfly, vedis, iceFireDb, qdb and rlite?

Patrickmi · on May 21, 2023

Honestly I’ve haven’t seen “The Redis Alternative” there’re some out there but it feels like “oh this is meant for us but it’s open source”

xerxes901 · on May 21, 2023

I winced when I saw the phrase "multithreaded fork" but then I realised this is the _other_ kind of fork, not the fork(2) syscall.

SillyUsername · on May 21, 2023

I'm confused. Redis is a cache with persistent backing option. This is a database. Have I missed something?

kristoff_it · on May 21, 2023

People think Redis is just a cache because the use the puggified version sold by AWS (elasticache). Enable persistence + AOF (the write ahead log) and you got yourself a database.

You can also add your own data structures to Redis, not as a form of syntax sugar over KV pairs, but as a dynamic library that you can write in C/C++/Zig/Rust, where you have full control over the in-memory representation.

But that's also another feature AWS takes away from you if you buy elasticache :^)

VWWHFSfQ · on May 21, 2023

I definitely don't consider redis a database since it's not acid compliant and any data you put in you have to be OK with losing. It's still just a cache to me.

kristoff_it · on May 22, 2023

If you think stuff like mongo is any better than redis, you fell for the marketing friend :^)

VWWHFSfQ · on May 22, 2023

We're talking about Redis

shawabawa3 · on May 21, 2023

Redis is a database

It's often used as a cache because it does key-value storage in memory well

SillyUsername · on May 21, 2023

It rebranded in 2015. It used to describe itself as a cache (earlier revisions also declare themselves to be a memory store) https://web.archive.org/web/20150304014343/http://redis.io/

I consider it to be cache first, "db" second, with true definition of db first being something that can execute SQL or SQL like statements (such as Cassandra's CQL). It's the same reason I don't call Cassandra a cache, although it can achieve the same result.

JohnBooty · on May 21, 2023

    true definition of db first being something that 
    can execute SQL or SQL like statements

Pedantic note: the term "database" existed long before the relational model or SQL existed. Many of the dominant databases of the 80s and 90s (dBase, etc) would not fit your invented definition.

Additionally, a lot of "toy" databases like Access can execute SQL statements, so the ability to execute SQL statements isn't necessarily a great way to tell what's a "real" database.

In practical terms, I do agree with you -- if somebody in 2023 is referring to "the database" in their app they had darn well better be talking about something robust and ACID-compliant like Postgres or whatever.

viralpraxis · on May 21, 2023

Your definition does not make any sense. Imagine defining « prime numbers » as {2,3,5} set

Zip3d · on May 21, 2023

It didnt use to be and is a nosql db now

patrec · on May 21, 2023

> Redis is a database

Do you have an example of an use case for which using Redis as a database works significantly better than using say Postgres or MySQL?

byby · on May 21, 2023

Side question about redis:

Does redis do async operations? I'm not sure how that would work because it's known to be in memory. But I do know it persists to disk. So basically my question is:

Does almost absolutely every operation on a redis database happen serially? Maybe not every single operation, but in general.

T0pH4t · on May 21, 2023

Sort of. To my knowledge, the original redis was literally signal threaded and performed io on the same threads as managing the DB (asynchronously of coarse). Then after either redis 5 or 6 they added IO threads to redis such that the DB stayed managed by a single thread and network/disk IO happened on the IO threads.

byby · on May 21, 2023

It makes sense for disk persistence to be io but redis is primarily in memory. So I don't think async applies to that aspect of it. In theory that makes it so all operations on the database from the user perspective are serial.

Not sure about this that's why I'm asking here for a definitive concrete answer about this.

drwiggly · on May 21, 2023

Every user/client operation is serial.

byby · on May 21, 2023

This is mainly because it's single threaded and operates on in memory state. Async io doesn't apply to in memory so all operations have to be serial, is this characterization correct? Thanks.