Show HN: Maelstrom – A Hermetic, Clustered Test Runner for Python and Rust

jitl · 2024-07-10T01:56:17 1720576577

Why did you build this? Any particular motivating experience? At $WORK, we use Jest to run each test suite in its own process. The places we struggle with isolation are for integration tests that read/write from the DB, so I’m curious what kinds of issues motivated container-per-test.

Are you going to sell it somehow?

Does it work nested inside Docker? I already have a CI setup that stripes batches of tests across worker containers.

nfachan · 2024-07-10T03:38:43 1720582723

At my previous company, we had a lot of tests. They were a mix of C and Python. Running all of them on a single machine took on the order of an hour or more. Even just limiting the tests run to those that could theoretically be affected by your change could take minutes or even tens of minutes.

We ended up building a shared cluster of ~1000 cores that was available to all developers, and that was used by CI. This changed our developers' workflows quite a bit. It was now possible to run large amounts of tests regularly: like every few minutes instead of a once or twice a day. This in turn encouraged developers to write more tests and do more test-driven development.

On top of that, having the cluster available provided other benefits. If a test was flakey, it was easy to run it tens or even hundreds of thousands of times, making it easy to reproduce and identify the bug. We also occasionally did Monte Carlo simulations, and it was really handy to have a lot of cores available for general developer use.

I got used to working that way and I've missed it since I left that company. So this project is an attempt to make a more general-purpose implementation of that system. I hope others will find similar workflows that make them more productive using this system or something like it.

Regarding the container-per-test idea. It really comes about because it's the obvious way to package up jobs to submit them to a cluster. Plus, it makes tests reproducible for all developers in a project, and between developer machines and CI. Using Linux namespaces, the overhead of running tests in individual containers isn't much more than running tests in individual processes.

nfachan · 2024-07-10T03:42:46 1720582966

I forgot to answer your two other questions.

Maelstrom is open source and we plan to keep it that way. We may look at ways of selling access to a hosted cluster as service. Test running is very elastic, and could benefit from having an elastic service to support it.

Maelstrom is completely root-less, so it'll work inside of Docker just fine. We regularly test Maelstrom within Maelstrom.

nine_k · 2024-07-10T02:45:01 1720579501

A container implementation that depends neither on Docker nor on runc is at the very least interesting, by itself.

nfachan · 2024-07-10T04:09:05 1720584545

If you're interested, the main part of the container implementation is here: https://github.com/maelstrom-software/maelstrom/blob/main/cr...

For each test we run, we clone the worker process, then make a bunch of Linux syscalls to set everything up for the container, then exec the test. We use the trick of having the child process share the virtual memory of the parent until the test is exec'ed.

We also use a technique where we build up a "program" of simple operations (each operation more or less maps to a syscall) in the parent before cloning, then evaluate the program in the child. This gives us the same performance benefits of using posix_spawn or vfork, but lets us configure all of the namespace stuff while we're spawning.

The code that's run in the child can be found here: https://github.com/maelstrom-software/maelstrom/blob/main/cr...

amluto · 2024-07-10T02:51:04 1720579864

It’s very easy to write one. I’ve done it in half an hour in bash. (Most of the half hour was spent cursing at various versions of util-linux that were broken in creative ways.)

Doing it well is a different story.

nine_k · 2024-07-10T03:16:59 1720581419

Well, yes, chroot, cgroups, mount --bind, and some ipfw / iptables stuff is enough to create a makeshift container.

I hope these guys are into doing it well, else runc would be more than adequate for low-level stuff.

amluto · 2024-07-10T04:30:22 1720585822

If anyone is doing it from scratch, in a real programming language (which, for better or for worse, seems to currently mean C or Go or futzing with the FFI raw syscalls), one shouldn’t use chroot or the mount syscall. The new mount API is much better.

Cgroups are nice and add some fun features, but they’re just icing on the cake and are also not necessary, even for a very functional and nicely secure container, unless the stuff inside the container needs cgroup delegation.

Using iptables to make a container is IMO pathetic, and I’m hoping to find time at some point to work out something better.

Joker_vD · 2024-07-10T14:18:21 1720621101

> The new mount API

Could you please tell what exactly this API is? I'd like to try and use it.

amluto · 2024-07-11T02:38:39 1720665519

open_tree() and related APIs. I’m not sure why the manpages never seem to have been applied, but they’re available from old posts:

https://lwn.net/Articles/829496/

And here’s an article about an old version of the syscalls:

https://lwn.net/Articles/759499/

nfachan · 2024-07-11T19:19:25 1720725565

We use our own small wrappers for these syscalls, built on top of Rust's libc crate. All our wrappers live here:

https://github.com/maelstrom-software/maelstrom/blob/main/cr...

For bind mounts, you want to look at open_tree and move_mount. For "regular" mounts, you want to look at fsopen, fsconfig, fsmount, and move_mount.

I found this video very useful: https://www.youtube.com/watch?v=gMWKFPnmJSc

nextaccountic · 2024-07-10T03:29:22 1720582162

For Rust tests, can Maelstrom be combined with nextest [0]? (Maelstrom provides the cargo maelstrom command, and nextest provides the cargo nextest command, with no obvious way to compose them)

I guess an env variable to specify which test command to run (very low level) or something like cargo maelstrom --nextest would work (but then how to compose with other test runners?)

Now,

> It's fast. In most cases, Maelstrom is faster than cargo test, even without using clustering.

That's surprising. Why is this the case?

Will Maelstrom without clustering (running on a single machine) be faster than nextest as well? (Nextest is also faster than bare cargo test [1])

Would combining nextest with Maelstrom bring further performance benefits, or is Malestrom already doing whatever improvements nextest do?

[0] https://nexte.st/

[1] https://nexte.st/docs/design/how-it-works/

nfachan · 2024-07-10T03:59:16 1720583956

Our desire is to provide a general test-running and job-running framework, not to built the world's next test runner. We think nextest is great, and we were inspired by some of the things they did.

We've designed Maelstrom to be usable as a library. So you can build your own test runner or job runner. We've been in contact with Rain, the primary developer of nextest, regarding how we can make it so that nextest can use Maelstrom. We'd love nothing more than to have nextest be Maelstrom-ized (Maelstrom-ified?).

We definitely have a little bit of work to do, but we plan to make big steps with the API for the next release. Currently, the client library doesn't give per-test updates until the test finishes. This means that you don't know how long a test is taking to run until it's completed (though we do provide a timeout feature). This is fine for our currently limited UI, but is probably insufficient for nextest.

Maelstrom in standalone mode running on a single machine is usually a bit slower than nextest. Maelstrom and nextest are similar in that they both run each test in their own processes, and they both do a good job of running enough test processes in parallel to keep the machine busy. Maelstrom has to do a little bit more work each time it starts a new process to set up the namespaces, so it's always going to be a bit slower than nextest, but not by much.

One thing that Maelstrom does that I don't think nextest does is to use Longest Processing Time First (LPT) scheduling (https://en.wikipedia.org/wiki/Longest-processing-time-first_...). When the runtimes of tests varies a lot within a project, using LPT can result in big wins and more predictable runtimes. Maelstrom itself actually has some pretty long-running integration tests, and once we added LPT, running Maelstrom tests on Maelstrom is usually faster than running them on nextest. But again, we're not talking about huge differences in single-machine cases.

I think cargo test is usually slower than both Maelstrom and nextest for the reasons described in the nextest documentation: cargo test doesn't always keep enough test threads running to keep the machine busy. However, if you have a lot of really small tests all in a single crate, then cargo test can and does outperform both Maelstrom and nextest. The clap project (https://github.com/clap-rs/clap)is a good example of this.

I think Maelstrom does most of the performance things that nextest does. However, nextest obviously has a lot more features and integrations than Maelstrom.

geekodour · 2024-07-10T07:51:43 1720597903

at first i thought it was about https://github.com/jepsen-io/maelstrom/tree/main which shares the same name

electric_mayhem · 2024-07-09T23:21:21 1720567281

Came here for the 90s shareware game from Ambrosia for the Mac.

I guess this is cool too though.

esafak · 2024-07-10T03:29:05 1720582145

There was also a development company called Maelstrom Games, known for https://en.wikipedia.org/wiki/Midwinter_(video_game)