Post author here. Sorry it was vague, but there's only so much detail you can go into in a blog post aimed at general audiences. Our documentation (https://antithesis.com/docs/) has a lot more info.
Here's my attempt at a more complete answer: think of the story of the blind men and the elephant. There's a thing, called fuzzing, invented by security researchers. There's a thing, called property-based testing, invented by functional programmers. There's a thing, called network simulation, invented by distributed systems people. There's a thing, called rare-event simulation, invented by physicists (!). But if you squint, all of these things are really the same kind of thing, which we call "autonomous testing". It's where you express high-level properties of your system, and have the computer do the grunt work to see if they're true. Antithesis is our attempt to take the best ideas from each of these fields, and turn them into something really usable for the vast majority of software.
We believe the two fundamental problems preventing widespread adoption of autonomous testing are: (1) most software is non-deterministic, but non-determinism breaks the core feedback loop that guides things like coverage-guided fuzzing. (2) the state space you're searching is inconceivably vast, and the search problem in full generality is insolubly hard. Antithesis tries to address both of these problems.
So... is it fuzzing? Sort of, except you can apply it to whole interacting networked systems, not just standalone parsers and libraries. Is it property-based testing? Sort of, except you can express properties that require a "global" view of the entire state space traversed by the system, which could never be locally asserted in code. Is it fault injection or chaos testing? Sort of, except that it can use the techniques of coverage guided fuzzing to get deep into the nooks and crannies of your software, and determinism to ensure that every bug is replayable, no matter how weird it is.
It's hard to explain, because it's hard to wrap your arms around the whole thing. But our other big goal is to make all of this easy to understand and easy to use. In some ways, that's proved to be even harder than the very hard technological problems we've faced. But we're excited and up for it, and we think the payoff could be big for our whole industry.
Your feedback about what's explained well and what's explained poorly is an important signal for us in this third very hard task. Please keep giving it to us!
I remember watching the Strange Loop video on your testing strategy, and now I need to go back and relearn how it differed from model checking (ie Promela or TLA+). Model checking is probably the big QA story that tech companies ignore because it requires dramatically more education, especially from QA departments typically seen as "inferior" to SWE.
This is interesting - it is kind of picking a fight with SaaS/cloud providers though, as that is the one kind of software you won't be able to import into your environment: not because it can't do the job, but because you don't have the code. So this would create an incentive to go back to PaaS.
It's definitely true though that a big problem with backend is that you can't easily treat it as a whole system for test purposes.
Is there more info on how Antithesis solves problem number 2 (large state spaces)? I understand the fuzzing / workload generation part well, but there's so many different state space reduction techniques that I don't know what Antithesis is doing under the hood to combat that.
Doesn't Antithesis rely on the fact that software is always deterministic? Reproducibility appears to be its top selling feature – something that wouldn't be possible if software were non-deterministic.
* Offer only good for x86-64 software that runs on Linux whose dependencies you can install locally or mock. The first two restrictions we will probably relax someday.
Right, by emulating a deterministic computer you can ensure that the inputs to the software are always deterministic – something traditional computing environments are unable to offer for various reasons.
However, if we pretend that software was somehow able to be non-deterministic, it would be able to evade your deterministic computer. But since software is always deterministic, you just have to guarantee determinism in the inputs.
>But since software is always deterministic, you just have to guarantee determinism in the inputs.
This is technically correct, but that's a very load-bearing "just". A lot of things would have to count as inputs. Think about execution time, for example. CPUs don't execute at the same speed all the time because of automatic throttling. Network packets have different flight times. Threads and processes get scheduled a little differently. In distributed/concurrent systems, all this matters. If you run the same workload twice, observable events will happen at different times and in different orders because of tiny deviations in initial conditions.
So yes, if you consider the time it takes to run every single machine instruction as an "input", then software is deterministic given the same inputs. But in the real world that's not actionable. Even if you had all those inputs, how are you going to pass them in? For all intents and purposes most software execution is non-deterministic.
The Antithesis simulation is deterministic in this way though. It is in charge of how long everything takes in "simulated time", right down to the running times of individual CPU instructions. Everything observable from within the simulation happens the exact same way, every time. You can compare a memory dump at the same (simulated) instant across two different runs and they will be bit-for-bit identical.
Sure. A good example. Execution time – more accurately, execution speed – isn't a property of software. For example, as you point out yourself, you can alter the execution speed without altering the software. It is, indeed, an input.
> Even if you had all those inputs, how are you going to pass them in?
Well, we know how to pass them in non-deterministically. That's how software is able to do anything.
Perhaps one could create a simulated environment that is able to control all the inputs? In fact, I'm told there is a company known as Antithesis working on exactly that.
Is the challenge here the same as with digital simulations of electronic circuits? That is, at the end of the day analog physics becomes confounding? Or are you doing deterministic simulation of random RF noise as well?
Has any thought been given to repurposing this deterministic computer for more than just autonomous testing/fuzzing? For example, given an ability to record/snapshot the state, resumable software (i.e. durable execution)?
Somebody once suggested to me that this could be very hand for the reproducible builds folks. I'm sure that now that we're out in the open, lots of people will suggest great applications for it.
My favourite application for "deterministic computer" is creating a cluster in order to have a virtual machine which is resilient to hardware failure. Potentially even "this VM will keep running even if an entire AWS region goes down" (although that would add significant latency).
This vaguely reminds me of Jefferson's "Virtual Time" paper from 1985[1]. The underlying idea at the time didn't really take off because it required, like Zookeeper, a greenfield project: except that it kinda doesn't and today you could imagine instrumenting an entire Linux syscall table and letting any Linux container become a virtual time system -- but Linux didn't exist in 1985 and wouldn't be standard until much later.
So Jefferson just says, let's take your I/O-ful process, split it a message-passing actor model, and monitor all the messages going in and coming out. The messages coming out, they won't necessarily do what they're supposed to do yet, they'll just be recorded with a plus sign and a virtual timestamp, and by assumption eventually you'll block on some response. So we have a bunch of recorded message timestamps coming in, we have your recorded messages going out.
Well, there's a problem here, which is that if we have multiple actors we may discover that their timestamps have traveled out-of-order. You sent some message at t=532 but someone actually sent you a message at t=231 that you might have selected instead of whatever you actually selected to send the t=532 message. (For instance in the OS case, they might have literally sent a SIGKILL to your process and you might not have sent anything after that.) That's what the plus sign is for, indirectly: we can restart your process from either a known synchronization state or else from the very beginning, we know all of its inputs during its first run so we have "determinized" it up past t=231 to see what it does now. Now, it sends a new message at say t=373. So we use the opposite of +, the minus sign, to send to all the other processes the "undo" message for their t=532 message, this removes it from their message buffer: that will never be sent to them. And if they haven't hit that timestamp in their personal processing yet, no further action is needed, otherwise we need to roll them back too. Doing so you determinize the whole networked cluster.
The only other really modern implementation of these older ideas that I remember seeing was Haxl[2], a Haskell library which does something similar but rather than using a virtual time coordinate, it just uses a process-local cache: when you request any I/O, it first fetches from the cache if possible and then if that's not possible it goes out, fetches the data, and then caches it. As a result you can just offer someone a pre-populated cache which, with these recorded inputs, will regenerate the offending stack trace deterministically.
> Your feedback about what's explained well and what's explained poorly is an important signal for us in this third very hard task. Please keep giving it to us!
It's hard to understand these complex concepts via language alone.
Diagrams would be a huge help to understand how this system of testing works compared to existing testing concepts
thanks, I'll dig in. I'm a very visual person and charts/diagrams/flows always help my grasp of something more than a wall of text. Maybe include some of those in there when you get the time?
Here's my attempt at a more complete answer: think of the story of the blind men and the elephant. There's a thing, called fuzzing, invented by security researchers. There's a thing, called property-based testing, invented by functional programmers. There's a thing, called network simulation, invented by distributed systems people. There's a thing, called rare-event simulation, invented by physicists (!). But if you squint, all of these things are really the same kind of thing, which we call "autonomous testing". It's where you express high-level properties of your system, and have the computer do the grunt work to see if they're true. Antithesis is our attempt to take the best ideas from each of these fields, and turn them into something really usable for the vast majority of software.
We believe the two fundamental problems preventing widespread adoption of autonomous testing are: (1) most software is non-deterministic, but non-determinism breaks the core feedback loop that guides things like coverage-guided fuzzing. (2) the state space you're searching is inconceivably vast, and the search problem in full generality is insolubly hard. Antithesis tries to address both of these problems.
So... is it fuzzing? Sort of, except you can apply it to whole interacting networked systems, not just standalone parsers and libraries. Is it property-based testing? Sort of, except you can express properties that require a "global" view of the entire state space traversed by the system, which could never be locally asserted in code. Is it fault injection or chaos testing? Sort of, except that it can use the techniques of coverage guided fuzzing to get deep into the nooks and crannies of your software, and determinism to ensure that every bug is replayable, no matter how weird it is.
It's hard to explain, because it's hard to wrap your arms around the whole thing. But our other big goal is to make all of this easy to understand and easy to use. In some ways, that's proved to be even harder than the very hard technological problems we've faced. But we're excited and up for it, and we think the payoff could be big for our whole industry.
Your feedback about what's explained well and what's explained poorly is an important signal for us in this third very hard task. Please keep giving it to us!