Very interesting! They say they went from a stack-based IR, which provided faster translation but slow execution time to a register-based one, which has slightly slower translations, but faster execution time. This in contrast with other runtimes which provide much slower translation times but very fast execution time via JIT.
I assume that for very short-lived programs, a stack based interpreter could be faster. And for long-lived programs, a JIT would be better.
This new IR seems to be targeting a sweet spot of short, but not super short lived programs. Also, it's a great alternative for environments where JIT is not possible.
I'm happy to see all these alternatives in the Wasm world! It's really cool. And thanks for sharing!
Even faster startup times can be achieved by so-called in-place interpreters that do not even translate the Wasm binary at all and instead directly execute it without adjustments. Obviously this is slower at execution compared to re-writing interpreters.
As there seem to be even more runtimes popping up (great job with wasmi, btw), it seems like a fun, maybe even full-time, job to keep up with them all.
Wasmi is an independent project nowadays. And you are right that it was originally designed for efficient smart contract execution but with a scope for more general use.
I see that this sentence was a bit vague. What was meant that Wasmi as a Rust library mirrors the Wasmtime API that can be found here: https://docs.rs/wasmtime/21.0.1/wasmtime/
But yes, Wasmi also supports WASI preview1 and can execute Wasm applications that have been compiled in compliance to WASI preview1.
This is the sort of iterative process other interpreters have gone through as well, the balance between startup speed and memory usage leads to byte code, the stack based interpreter is replaced by a register based one. Later but not yet in this project, the byte code is replaced with a list of addresses, and then these are converted to machine code and inlined, in the simplest possible form of compilation, next code is generated for branches to make loops and flow control fast and we eventually get back to a jit. This gets saved to a binary image and we have a compiler.
Yes this iterative process is indeed very visible. Wasmi started out as a mostly safe Rust interpreter and over time went more and more into a performance oriented direction.
Though I have to say that the "list of addresses" approach is not optimal in Rust today since Rust is missing explicit tail calls. Stitch applies some tricks to achieve tail calls in Rust but this has some drawbacks that are discussed in detail at Stitch's README.
Furthermore the "list of addresses" (or also known as threaded code dispatch) has some variance. From what I know both Wasm3 and Stitch use direct threaded code which stores a list of function pointers to instruction handlers and use tail calls or computed-goto to fetch the next instruction. The downside compared to bytecode is that direct threaded code uses more memory and also it is only faster when coupled with computed-goto or tail calls. Otherwise compilers nowadawys are pretty solid in their optimizations for loop-switch constructs and could technically even generate computed-goto-like code.
Thus, due to the lower memory usage, the downsides of using tail calls in Rust and the potential of compiler optimizations with loop-switch constructs we went for the bytecode approach in Wasmi.
I love all the improvements that Wasmi has been doing lately. Being so close to the super-optimal interpreter Stitch (a new interpreter similar to Wasm3, but made in Rust) is quite impressive.
As a side note, I wish most of the runtimes in Rust stopped adopting the "linker" paradigm for imports, as is a completely unnecessary abstraction when setting up imports is a close-to-zero cost
when using lazy-unchecked translation with relatively small programs, setting up the Linker sometimes can take up the majority of the overall execution with ~50 host functions (which is a common average number). We are talking about microseconds, but microseconds come to play an importance at these scales. This is why for Wasmi we implemented the LinkerBuilder for a 120x speed-up. :)
I'm curious if you tried Wizard. A cursory look at some of the benchmarks looks like at least some have dependencies other than WASI. How did you run on them on the other engines?
I am aware of Wizard and I think it is a pretty interesting Wasm runtime. It would be really cool if it was part of Wasmi's benchmark testsuite (https://github.com/wasmi-labs/wasmi-benchmarks). Contributions to add more Wasm runtimes and more test cases are very welcome.
The non-WASI test cases are only for testing translation performance, thus their imports are not necessary to be satisfied. This would have been the case if the benchmarks tested instantiation performance instead. Usually instantiation is pretty fast though for most Wasm runtimes compared to translation time.
For startup performance Wasmi and Wasm3 are both the fastest engines according to the benchmarks.
For execution performance you are right that generally JIT engines are expected to be faster than interpreter based Wasm engines.
Also, as stated in the article, on Apple silicon Wasmi currently performs kinda poorly but this will be improved in the future.
On AMD server chips Wasmi is the fastest Wasm interpreter.
No I do not but it is a very interesting question and probably not even answerable in practice because not every instruction takes the same amount of time to execute to completion. An outlier in this regard are for example host function calls which could do arbitrary things on the host side or bulk-memory operations which scale linearly with their inputs etc.
Someone should make a programming language called ‘ever’ with a reeeally slow interpreter, so that everyone else can now legit claim that their language is faster than ever.
I know its a joke but it makes me wonder is there such a constant speed type of language? Maybe it would be enforced by an emulator its speed but it would be interesting to have a known speed to measure against.
I assume that for very short-lived programs, a stack based interpreter could be faster. And for long-lived programs, a JIT would be better.
This new IR seems to be targeting a sweet spot of short, but not super short lived programs. Also, it's a great alternative for environments where JIT is not possible.
I'm happy to see all these alternatives in the Wasm world! It's really cool. And thanks for sharing!