Post author here. Sorry it was vague, but there's only so much detail you can go...

jldugger · 2024-02-13T17:01:44 1707843704

I remember watching the Strange Loop video on your testing strategy, and now I need to go back and relearn how it differed from model checking (ie Promela or TLA+). Model checking is probably the big QA story that tech companies ignore because it requires dramatically more education, especially from QA departments typically seen as "inferior" to SWE.

rhodin · 2024-02-13T17:10:58 1707844258

Video of [0] the Strangeloop talk [1].

[0] https://www.youtube.com/watch?v=4fFDFbi3toc [1] https://thestrangeloop.com/2014/testing-distributed-systems-...

ajb · 2024-02-13T18:46:19 1707849979

This is interesting - it is kind of picking a fight with SaaS/cloud providers though, as that is the one kind of software you won't be able to import into your environment: not because it can't do the job, but because you don't have the code. So this would create an incentive to go back to PaaS.

It's definitely true though that a big problem with backend is that you can't easily treat it as a whole system for test purposes.

pkghost · 2024-02-13T21:31:02 1707859862

> it is kind of picking a fight with SaaS/cloud providers

or starting a bidding war

ajb · 2024-02-13T22:05:08 1707861908

how so?

i_am_jl · 2024-02-14T11:41:30 1707910890

By selling cloud providers the chance to be the only cloud provider supported by Antithesis.

ajb · 2024-02-14T13:48:02 1707918482

Ambitious. I doubt if antithesis's moat is big enough for that.

criddell · 2024-02-13T21:23:09 1707859389

> turn them into something really usable for the vast majority of software

Would it work for debugging, say, Notepad on Windows?

amw-zero · 2024-02-13T21:29:11 1707859751

Is there more info on how Antithesis solves problem number 2 (large state spaces)? I understand the fuzzing / workload generation part well, but there's so many different state space reduction techniques that I don't know what Antithesis is doing under the hood to combat that.

randomdata · 2024-02-13T17:07:44 1707844064

> most software is non-deterministic

Doesn't Antithesis rely on the fact that software is always deterministic? Reproducibility appears to be its top selling feature – something that wouldn't be possible if software were non-deterministic.

wwilson · 2024-02-13T17:18:03 1707844683

We can force any* software to be deterministic.

* Offer only good for x86-64 software that runs on Linux whose dependencies you can install locally or mock. The first two restrictions we will probably relax someday.

pokler · 2024-02-13T21:15:44 1707858944

That point about dependencies -- how well does this play or easy to integrate with a build system like Bazel or Buck?

randomdata · 2024-02-13T17:28:17 1707845297

Aren't you just 'forcing' determinism in the inputs, relying on the software to be always deterministic for the same inputs?

wwilson · 2024-02-13T17:38:47 1707845927

Nope. We’re emulating a deterministic computer, so your software can’t act nondeterministically if it tries.

randomdata · 2024-02-13T17:42:17 1707846137

Right, by emulating a deterministic computer you can ensure that the inputs to the software are always deterministic – something traditional computing environments are unable to offer for various reasons.

However, if we pretend that software was somehow able to be non-deterministic, it would be able to evade your deterministic computer. But since software is always deterministic, you just have to guarantee determinism in the inputs.

_dain_ · 2024-02-13T18:49:11 1707850151

[I work at Antithesis]

>But since software is always deterministic, you just have to guarantee determinism in the inputs.

This is technically correct, but that's a very load-bearing "just". A lot of things would have to count as inputs. Think about execution time, for example. CPUs don't execute at the same speed all the time because of automatic throttling. Network packets have different flight times. Threads and processes get scheduled a little differently. In distributed/concurrent systems, all this matters. If you run the same workload twice, observable events will happen at different times and in different orders because of tiny deviations in initial conditions.

So yes, if you consider the time it takes to run every single machine instruction as an "input", then software is deterministic given the same inputs. But in the real world that's not actionable. Even if you had all those inputs, how are you going to pass them in? For all intents and purposes most software execution is non-deterministic.

The Antithesis simulation is deterministic in this way though. It is in charge of how long everything takes in "simulated time", right down to the running times of individual CPU instructions. Everything observable from within the simulation happens the exact same way, every time. You can compare a memory dump at the same (simulated) instant across two different runs and they will be bit-for-bit identical.

randomdata · 2024-02-13T19:04:32 1707851072

> Think about execution time, for example.

Sure. A good example. Execution time – more accurately, execution speed – isn't a property of software. For example, as you point out yourself, you can alter the execution speed without altering the software. It is, indeed, an input.

> Even if you had all those inputs, how are you going to pass them in?

Well, we know how to pass them in non-deterministically. That's how software is able to do anything.

Perhaps one could create a simulated environment that is able to control all the inputs? In fact, I'm told there is a company known as Antithesis working on exactly that.

mlhpdx · 2024-02-13T21:00:22 1707858022

Oh, that sounds like a challenge…

Is the challenge here the same as with digital simulations of electronic circuits? That is, at the end of the day analog physics becomes confounding? Or are you doing deterministic simulation of random RF noise as well?

loeg · 2024-02-14T00:23:43 1707870223

Do you emit deterministic sequences from things like RDRAND? I guess you'd have to.

cortesoft · 2024-02-14T05:21:09 1707888069

Yes, they said they do

kodablah · 2024-02-13T17:44:35 1707846275

Has any thought been given to repurposing this deterministic computer for more than just autonomous testing/fuzzing? For example, given an ability to record/snapshot the state, resumable software (i.e. durable execution)?

wwilson · 2024-02-13T17:51:51 1707846711

Somebody once suggested to me that this could be very hand for the reproducible builds folks. I'm sure that now that we're out in the open, lots of people will suggest great applications for it.

Disclosure: Antithesis co-founder.

cperciva · 2024-02-13T18:13:31 1707848011

My favourite application for "deterministic computer" is creating a cluster in order to have a virtual machine which is resilient to hardware failure. Potentially even "this VM will keep running even if an entire AWS region goes down" (although that would add significant latency).

crdrost · 2024-02-13T17:23:48 1707845028

This vaguely reminds me of Jefferson's "Virtual Time" paper from 1985[1]. The underlying idea at the time didn't really take off because it required, like Zookeeper, a greenfield project: except that it kinda doesn't and today you could imagine instrumenting an entire Linux syscall table and letting any Linux container become a virtual time system -- but Linux didn't exist in 1985 and wouldn't be standard until much later.

So Jefferson just says, let's take your I/O-ful process, split it a message-passing actor model, and monitor all the messages going in and coming out. The messages coming out, they won't necessarily do what they're supposed to do yet, they'll just be recorded with a plus sign and a virtual timestamp, and by assumption eventually you'll block on some response. So we have a bunch of recorded message timestamps coming in, we have your recorded messages going out.

Well, there's a problem here, which is that if we have multiple actors we may discover that their timestamps have traveled out-of-order. You sent some message at t=532 but someone actually sent you a message at t=231 that you might have selected instead of whatever you actually selected to send the t=532 message. (For instance in the OS case, they might have literally sent a SIGKILL to your process and you might not have sent anything after that.) That's what the plus sign is for, indirectly: we can restart your process from either a known synchronization state or else from the very beginning, we know all of its inputs during its first run so we have "determinized" it up past t=231 to see what it does now. Now, it sends a new message at say t=373. So we use the opposite of +, the minus sign, to send to all the other processes the "undo" message for their t=532 message, this removes it from their message buffer: that will never be sent to them. And if they haven't hit that timestamp in their personal processing yet, no further action is needed, otherwise we need to roll them back too. Doing so you determinize the whole networked cluster.

The only other really modern implementation of these older ideas that I remember seeing was Haxl[2], a Haskell library which does something similar but rather than using a virtual time coordinate, it just uses a process-local cache: when you request any I/O, it first fetches from the cache if possible and then if that's not possible it goes out, fetches the data, and then caches it. As a result you can just offer someone a pre-populated cache which, with these recorded inputs, will regenerate the offending stack trace deterministically.

1: https://dl.acm.org/doi/10.1145/3916.3988

2: https://github.com/facebook/Haxl

gitgud · 2024-02-13T21:46:40 1707860800

> Your feedback about what's explained well and what's explained poorly is an important signal for us in this third very hard task. Please keep giving it to us!

It's hard to understand these complex concepts via language alone.

Diagrams would be a huge help to understand how this system of testing works compared to existing testing concepts

EasyMark · 2024-02-13T20:29:18 1707856158

thanks, I'll dig in. I'm a very visual person and charts/diagrams/flows always help my grasp of something more than a wall of text. Maybe include some of those in there when you get the time?