I did this for a decently complex distributed system for embedded devices, and it practically saved my life.
It was in C and I didn't do any language extending/precompiling, but I had interfaces for everything related to I/O, execution actors, randomness, etc.
On target hardware everything used TCP/UDP/real disk, pthreads, normal rand sources, etc. In simulation everything used virtual networking, a single simulation thread stepping all event loops, test-specified random seeds for reproducibility, etc.
It is completely invaluable. I can concisely write completely deterministic system tests that will execute tens of thousands of lines of code. I can fuzz test actor scheduling, I/O problems like dropped packets/msgs, and everything you can think of. I can run the entire test suite in valgrind and other nice tools. I can put a big machine in a corner of the office to fuzz test the suite for weeks on end and email me when a a test fails and tell me exactly which random seed to use to reproduce the failure myself within minutes. I can debug the entire simulation perfectly in GDB.
I've barely begun to describe how great it is to have, how many bugs these tests have caught or what a reliable regression test suite one can build. It doesn't replace testing on target - I do that extensively as well. Big system scenario tests don't replace smaller module and unit tests - I do those too. But deterministic simulation testing saved my sanity. Don't hesitate to evaluate this approach if you're doing something similar.
You actually have a very pleasant presentation style, it drew me in and I enjoyed every minute of it. Though perhaps it's just because I find the subject and your solution very interesting.
What's your standpoint on formal verification? Did you guys think about it and reject it?
Great stuff! Would it be fair to characterize this in software architecture terms as taking your components (under test) and using different connectors that model the real world?
http://db.cs.berkeley.edu/papers/dbtest12-bloom.pdf has a really nice extension of this idea - since their language is unordered by default and has only a few explicit ordered primitives, they can use an SMT solver to determine which event interleavings cannot possibly affect the result. This lets them generate schedules that more efficiently explore the space of all possible schedules and find bugs faster.
This talk is awesome. The idea that the simulation framework becomes part of the production code is brilliant. I wonder if these ideas could be merged with formal methods, so that perhaps a model of the simulation could be generated, and then through model analysis it could generate simulation stories for itself that humans might overlook.
It was in C and I didn't do any language extending/precompiling, but I had interfaces for everything related to I/O, execution actors, randomness, etc.
On target hardware everything used TCP/UDP/real disk, pthreads, normal rand sources, etc. In simulation everything used virtual networking, a single simulation thread stepping all event loops, test-specified random seeds for reproducibility, etc.
It is completely invaluable. I can concisely write completely deterministic system tests that will execute tens of thousands of lines of code. I can fuzz test actor scheduling, I/O problems like dropped packets/msgs, and everything you can think of. I can run the entire test suite in valgrind and other nice tools. I can put a big machine in a corner of the office to fuzz test the suite for weeks on end and email me when a a test fails and tell me exactly which random seed to use to reproduce the failure myself within minutes. I can debug the entire simulation perfectly in GDB.
I've barely begun to describe how great it is to have, how many bugs these tests have caught or what a reliable regression test suite one can build. It doesn't replace testing on target - I do that extensively as well. Big system scenario tests don't replace smaller module and unit tests - I do those too. But deterministic simulation testing saved my sanity. Don't hesitate to evaluate this approach if you're doing something similar.