Rust testing or verifying: Why not both?

undergrowth54 · on Sept 4, 2020

Keep in mind that tests & test tooling serves developer ergonomics purposes beyond ensuring that the software works:

1. Reproducibly setting up the conditions to trigger a bug.

2. Explaining how a module expects to be used.

3. Sketching the design of a module's interface & affordances.

4. Nudging away from designs with poor encapsulation.

5. Soothing the anxiety of a developer who is worried they are not making hour-by-hour progress.

6. Reminding a developer what task they were in the middle of when their brain dumps Working Memory due to an interrupt ...or just randomly OOMKills mid-day because a baby woke them up at 3am.

marcosdumay · on Sept 4, 2020

A good type system will help with # 2, 3, 4, and 6 too.

In fact, types are better for #6 than tests, because they have lower overhead, so they are more likely to be complete at the interruption.

undergrowth54 · on Sept 4, 2020

Agreed on the first 3.

I think you might have misunderstood #6, because I don't see how a type system can function as a todo list.

dllthomas · on Sept 4, 2020

As I am writing my code, I try to make sure that when I need to make a change, I can make part of the change and the type checker will surface everything that needs to change in tandem (not always in a single run). The todo list is then simply the list of remaining errors, and I can usually jump right to exactly where the fix is needed.

This sometimes works out really impressively well. There was a time I was working on a project that required really low latency at low-to-moderate throughput. It was greenfield C development, fairly experimental. I made a habit of tagging functions with an argument describing which of the statically defined threads the function needed to live on, tagging data with which thread it should be accessed from, and checking agreement between them (without runtime overhead).

This couldn't be rigorously enforced by the type system - C has way too many escape hatches - but it was easy to build a little bit of infrastructure such that doing it right was significantly easier than doing it wrong, and doing it wrong was obvious.

When (as is inevitable) we discovered that we wanted to do significantly more work somewhere and so wanted to move it out of the hot path, or discovered that some new thing we wanted to do unavoidably had to now live in the hot path, &c, I could move code between functions and the compiler would tell me everywhere I was doing something that didn't belong in the new thread.

undergrowth54 · on Sept 7, 2020

Huh.

hmmm....

You know, I think you've just revealed something about the shallowness of my experience with type systems.

marcosdumay · on Sept 4, 2020

Defining a test like:

    # Checks if the user has the correct authorization
    log(non_authorized_user)
    def some_action:
        assert_failure(user_has_authroization)

isn't much different from checking that on the types:

    data AuthorizedUser authorizationSchema = AuthrorizedUser User
    someAction :: AuthroizedUser ActionAuthorization -> IO ()
    someAction = undefined

TheDong · on Sept 4, 2020

Yes, compile-time type-checking can be replaced with a sufficient quantity of runtime type-checking in the form of unit tests.

However, the compiler's type-checking is automatic, so you don't have to write a ton of unit tests to get its benefit, and the error messages are usually much better.

I've yet to see a dynamically typed codebase that has sufficient unit tests to replace a decent compile-time type-system.

One other point that matters is time: running 20k unit and integration tests to verify the types are all correct for your 6kloc will take much longer than compiling 6kloc, and the compiler does a much better job of updating incrementally based on code changes than any unit testing framework I've used.

marcosdumay · on Sept 4, 2020

> One other point that matters is time: running 20k unit and integration tests to verify the types are all correct for your 6kloc will take much longer than compiling 6kloc

I'll disagree here. The kind of thing you can verify at compile-time can be tested very quickly, and compilers with less useful type systems can be much faster. All in, the tests are probably faster to run.

But yeah, compilers do make a better job on incremental validation. That's probably intrinsic; validation is very likely more prone to be incremental than tests.

aw1621107 · on Sept 4, 2020

> The kind of thing you can verify at compile-time can be tested very quickly

I’m not sure I entirely agree with this. For example, a type system can ensure a function is always called with the right units, but testing that every use of the function is correct might be a lot more tedious, depending on the code base.

> and compilers with less useful type systems can be much faster.

This depends very heavily on the language (e.g., OCaml being a language with a relatively complex type system that also compiled fairly quickly), but Id agree that what you said appears to be the general trend.

ghostwriter · on Sept 4, 2020

for instance, https://ghc.gitlab.haskell.org/ghc/doc/users_guide/exts/type...

siscia · on Sept 4, 2020

I implemented a tiny and simple data structure in rust, a circular buffer: https://github.com/RedBeardLab/circular-buffer-rs

The amount of subtle bugs and problems that proptest was able to point out to me was terrific.

Property based testing it is definitely the tool that you want to use to thoroughly test a complex library that need to maintains its invariants.

The interesting bit is that if you approach the problem with unittest, you will write mostly test that pass. If you approach the test writing as a property test, then you will write much harder test cases.

eru · on Sept 4, 2020

Apropos hard test cases: have a look at the floats that Python's Hypothesis property testing library generates. They are truly evil.

pjmlp · on Sept 4, 2020

More interesting, is that from his blog I ended up on Project Oak, being written in Rust

https://github.com/project-oak/oak

adreid · on Sept 4, 2020

That is the project we are trying to support with our work. Unfortunately, this means I have to solve lots of hard problems like scaling, usability, modular verification, etc.

pjmlp · on Sept 4, 2020

On the other hand I guess those are nice opportunities for new papers on the domain. :)

Very interesting research area.

ivanbakel · on Sept 4, 2020

I'm pleased to see that "verification" isn't just relying on random value generators for property testing in this case. While I agree with the author that actually running code is still useful, simple specifications have the power to be much more than glorified fuzzing - and Rust is the perfect language to start pushing that in. The use of native syntax for int value ranges is especially nice.

The main question is: how does this compare to other Rust verification efforts[0]? Having to write specifications in test crates rather than next to the function definitions is probably going to be a big obstacle to multi-crate verification.

[0]: https://github.com/viperproject/prusti-dev/

adreid · on Sept 4, 2020

This is more at the end of automatic verification tools like KLEE, SMACK, etc. Auto active tools like Prusti are super-interesting too and I suspect that we need a hybrid approach.

ivanbakel · on Sept 4, 2020

From what I understand of Prusti, it still aims to be automatic and the underlying Viper pipeline still relies on constraint solving for the most part - certainly the Prusti examples don't seem to do anything more complex than that.

Would you be able to outline the main differences?

adreid · on Sept 4, 2020

The difference is really about the level of annotation you provide. In testing (and the more automated verifiers) you might add assertions. In the “auto active” tools, you are adding more function contracts, invariants, maybe hints to help find a proof, etc.

baby · on Sept 4, 2020

Check mirai https://github.com/facebookexperimental/MIRAI

kybernetikos · on Sept 4, 2020

This article makes me think that we should consider tests as a monte-carlo integration over the space that verification tries to cover.

eru · on Sept 4, 2020

Interesting perspective. It only really works with property based testing, not with conventional fixed-example based testing.

kybernetikos · on Sept 6, 2020

It is definitely more natural with property based testing, but I think it works to some extent as a perspective in general.

You have a huge input space where each point represents some combination of possible inputs, you can think of representing correct behaviour as 0 and bad behaviour as 1 for each of those points. You're attempting to show by your verifying and testing as best as you can that the sum across all the inputs is 0.

For fixed example testing, you sample some points that you consider to be representative or boundaries, show that they are 0, and use that to argue that the whole space is 0. With property testing, you can let it literally do the monte-carlo thing. With verification you attempt to do the integration analytically.

eru · on Sept 9, 2020

All good points!

I think just as reliability engineering benefited moving from 'keep the system always up' (but fail occasionally in the real world) to 'keep the system up 99.999% of the time' (and succeed in a measurable way), fixing bugs in software might benefit from such an honest and quantitative approach.

sn9 · on Sept 4, 2020

This looks super cool!

I was under the impression that burntsushi's implementation of quickcheck [0] was the most popular property-based testing library for Rust. Was I mistaken or was there a reason the authors chose another library as inspiration?

[0] https://github.com/BurntSushi/quickcheck

shepmaster · on Sept 4, 2020

As a previous user of quickcheck and now user of proptest, I can say that proptest feels much better to use because it doesn't tie quite as directly to a single type (as a sibling comment mentions).

With quickcheck, I had to define `MyTypeWithAnOddOffset` and `MyTypeWithAnEvenOffset` (or equivalent formulation) and composing things was tedious or hard.

With proptest [1], things flow much nicer.

[1]: https://github.com/shepmaster/jetscii/blob/2b1039a9fb8d8ad9c...

adreid · on Sept 4, 2020

Burnt Sushi only uses the type to generate values. Proptest lets you constrain the values more precisely. Eg you can create a btreemap with up to 5 entries where the keys are some subset of ints (say) and the values are some subset of some other type. This can capture many of the constraints of a functions precondition and can be implemented efficiently in both fuzzers and verifiers.

jeremysalwen · on Sept 5, 2020

It would be cool if you could also use the exact same interface for coverage based fuzzing, creating a unified property testing/fuzzing/formal verification interface.