If you only used the versions of databases tested by Jepsen you would have probl...

jakewins · on May 21, 2023

This doesn’t seem to be what was suggested? Using an up-to-date database that has had its distributed mechanisms tested - even if the test was a few versions back - is a lot better than something that uses completely untested bespoke algos.

As for verifying Jepsen, I’m not entirely sure what you mean? It’s a non-deterministic test suite and reports the infractions it finds; the infractions found are obviously correct to anyone in the industry that works on this stuff.

Passing a Jepsen test doesn’t prove your system is safe, and nobody involved with Jepsen has claimed that anywhere I’ve seen.

29athrowaway · on May 21, 2023

As a fork, it is essentially a new version of an already Jepsen-tested system. That meets your definition of "a lot better" than "untested bespoke algos".

jakewins · on May 21, 2023

I can’t tell if you are trolling me, but obviously “we made this single-threaded thing multithreaded” implies new algorithms that need new test runs

29athrowaway · on May 21, 2023

I am just evaluating your logic in the form of a reply, so you can see it at work. The same issue you describe happens often across multiple versions of a system.

nemothekid · on May 21, 2023

The part of Redis that were Jepsen stressed, Redis-raft more recently, and Redis sentinel, for which the numerous flaws were pretty much summed at "as designed". No part of KeyDB has gone through a Jepsen style audit, all of which are untested bespoke algos.

bastawhiz · on May 21, 2023

Chrome is based on a fork of Webkit. Are you going to trust it because Safari passed a review before the fork?

29athrowaway · on May 22, 2023

Ask one level higher in the thread.

bastawhiz · on May 21, 2023

> Then, has someone independently verified the Jepsen testing framework?

I don't think it matters. Jepsen finds problems. Lots of them. It's not intended to find all the problems. But it puts the databases it tests through a real beating by exercising cases that can happen, but are perhaps unlikely (or unlikely until you've been running the thing in production for quite a while). Having an independent review does nothing, practically, to make the results of the tests better.

In fact, almost nothing gets a perfectly clean Jepsen report. Moreover, many of the problems that are found get fixed before the report goes out. The whole point is that you can see how bad the problems are and judge for yourself whether the people maintaining the project are playing fast and loose or thinking rigorously about their software. There simply isn't a "yeah this project is good to go" rubber stamp. Jepsen isn't Consumer Reports.

29athrowaway · on May 22, 2023

Have you ever been in a situation when you are writing tests where you get a test failure and realized that your test assertion was wrong?

bastawhiz · on May 25, 2023

Jepsen is more like a fuzzer than a unit test suite. It outputs "I did this and I got back this unexpected result". All of those outputs need to be analyzed by hand.

You don't audit a fuzzer to say "what if it runs the thing wrong". That's not the point of the fuzzer. The point is to do lots of weird stuff and check that the output of the system matches the expectation of what's produced. If the fuzzer outputs a result that's actually expected, then that's easily determined because you have to critically analyze what comes out of the tool in the first place.

ibotty · on May 23, 2023

It doesn't really matter if the Jensen test suite is faulty. If it reports something and it is (against all odds) not a valid bug. What's the problem. It does not claim to find all problems (and can't) so this is sufficient.