Keep in mind that tests & test tooling serves developer ergonomics purposes beyond ensuring that the software works:
1. Reproducibly setting up the conditions to trigger a bug.
2. Explaining how a module expects to be used.
3. Sketching the design of a module's interface & affordances.
4. Nudging away from designs with poor encapsulation.
5. Soothing the anxiety of a developer who is worried they are not making hour-by-hour progress.
6. Reminding a developer what task they were in the middle of when their brain dumps Working Memory due to an interrupt ...or just randomly OOMKills mid-day because a baby woke them up at 3am.
As I am writing my code, I try to make sure that when I need to make a change, I can make part of the change and the type checker will surface everything that needs to change in tandem (not always in a single run). The todo list is then simply the list of remaining errors, and I can usually jump right to exactly where the fix is needed.
This sometimes works out really impressively well. There was a time I was working on a project that required really low latency at low-to-moderate throughput. It was greenfield C development, fairly experimental. I made a habit of tagging functions with an argument describing which of the statically defined threads the function needed to live on, tagging data with which thread it should be accessed from, and checking agreement between them (without runtime overhead).
This couldn't be rigorously enforced by the type system - C has way too many escape hatches - but it was easy to build a little bit of infrastructure such that doing it right was significantly easier than doing it wrong, and doing it wrong was obvious.
When (as is inevitable) we discovered that we wanted to do significantly more work somewhere and so wanted to move it out of the hot path, or discovered that some new thing we wanted to do unavoidably had to now live in the hot path, &c, I could move code between functions and the compiler would tell me everywhere I was doing something that didn't belong in the new thread.
Yes, compile-time type-checking can be replaced with a sufficient quantity of runtime type-checking in the form of unit tests.
However, the compiler's type-checking is automatic, so you don't have to write a ton of unit tests to get its benefit, and the error messages are usually much better.
I've yet to see a dynamically typed codebase that has sufficient unit tests to replace a decent compile-time type-system.
One other point that matters is time: running 20k unit and integration tests to verify the types are all correct for your 6kloc will take much longer than compiling 6kloc, and the compiler does a much better job of updating incrementally based on code changes than any unit testing framework I've used.
> One other point that matters is time: running 20k unit and integration tests to verify the types are all correct for your 6kloc will take much longer than compiling 6kloc
I'll disagree here. The kind of thing you can verify at compile-time can be tested very quickly, and compilers with less useful type systems can be much faster. All in, the tests are probably faster to run.
But yeah, compilers do make a better job on incremental validation. That's probably intrinsic; validation is very likely more prone to be incremental than tests.
> The kind of thing you can verify at compile-time can be tested very quickly
I’m not sure I entirely agree with this. For example, a type system can ensure a function is always called with the right units, but testing that every use of the function is correct might be a lot more tedious, depending on the code base.
> and compilers with less useful type systems can be much faster.
This depends very heavily on the language (e.g., OCaml being a language with a relatively complex type system that also compiled fairly quickly), but Id agree that what you said appears to be the general trend.
The amount of subtle bugs and problems that proptest was able to point out to me was terrific.
Property based testing it is definitely the tool that you want to use to thoroughly test a complex library that need to maintains its invariants.
The interesting bit is that if you approach the problem with unittest, you will write mostly test that pass. If you approach the test writing as a property test, then you will write much harder test cases.
That is the project we are trying to support with our work. Unfortunately, this means I have to solve lots of hard problems like scaling, usability, modular verification, etc.
I'm pleased to see that "verification" isn't just relying on random value generators for property testing in this case. While I agree with the author that actually running code is still useful, simple specifications have the power to be much more than glorified fuzzing - and Rust is the perfect language to start pushing that in. The use of native syntax for int value ranges is especially nice.
The main question is: how does this compare to other Rust verification efforts[0]? Having to write specifications in test crates rather than next to the function definitions is probably going to be a big obstacle to multi-crate verification.
This is more at the end of automatic verification tools like KLEE, SMACK, etc.
Auto active tools like Prusti are super-interesting too and I suspect that we need a hybrid approach.
From what I understand of Prusti, it still aims to be automatic and the underlying Viper pipeline still relies on constraint solving for the most part - certainly the Prusti examples don't seem to do anything more complex than that.
Would you be able to outline the main differences?
The difference is really about the level of annotation you provide. In testing (and the more automated verifiers) you might add assertions. In the “auto active” tools, you are adding more function contracts, invariants, maybe hints to help find a proof, etc.
It is definitely more natural with property based testing, but I think it works to some extent as a perspective in general.
You have a huge input space where each point represents some combination of possible inputs, you can think of representing correct behaviour as 0 and bad behaviour as 1 for each of those points. You're attempting to show by your verifying and testing as best as you can that the sum across all the inputs is 0.
For fixed example testing, you sample some points that you consider to be representative or boundaries, show that they are 0, and use that to argue that the whole space is 0. With property testing, you can let it literally do the monte-carlo thing. With verification you attempt to do the integration analytically.
I think just as reliability engineering benefited moving from 'keep the system always up' (but fail occasionally in the real world) to 'keep the system up 99.999% of the time' (and succeed in a measurable way), fixing bugs in software might benefit from such an honest and quantitative approach.
I was under the impression that burntsushi's implementation of quickcheck [0] was the most popular property-based testing library for Rust. Was I mistaken or was there a reason the authors chose another library as inspiration?
As a previous user of quickcheck and now user of proptest, I can say that proptest feels much better to use because it doesn't tie quite as directly to a single type (as a sibling comment mentions).
With quickcheck, I had to define `MyTypeWithAnOddOffset` and `MyTypeWithAnEvenOffset` (or equivalent formulation) and composing things was tedious or hard.
Burnt Sushi only uses the type to generate values.
Proptest lets you constrain the values more precisely.
Eg you can create a btreemap with up to 5 entries where the keys are some subset of ints (say) and the values are some subset of some other type.
This can capture many of the constraints of a functions precondition and can be implemented efficiently in both fuzzers and verifiers.
It would be cool if you could also use the exact same interface for coverage based fuzzing, creating a unified property testing/fuzzing/formal verification interface.
1. Reproducibly setting up the conditions to trigger a bug.
2. Explaining how a module expects to be used.
3. Sketching the design of a module's interface & affordances.
4. Nudging away from designs with poor encapsulation.
5. Soothing the anxiety of a developer who is worried they are not making hour-by-hour progress.
6. Reminding a developer what task they were in the middle of when their brain dumps Working Memory due to an interrupt ...or just randomly OOMKills mid-day because a baby woke them up at 3am.