It depends. In my experience generally GraalVM is slower to start, but after a f...

chc4 · 2024-06-07T19:17:59.000000Z

Truffle/Graal is also able to do some insanely cool things related to optimizing across FFI boundries: if you have a Java program that uses the Truffle javascript engine for scripting, Truffle is able to do JIT optimization transparently across the FFI boundry so it has 0 overhead. IIRC they even have some special API to allow a Truffle->native C library to be exposed to the runtime in a way that allows for it to optimize away a lot of FFI overhead or inline the native function into the trace. They were advertising this "Polyglot VM" functionality a lot a few years ago, although now their marketing mostly focuses on the NativeImage part (which helps a lot with the slow startup you mention).

TruffleRuby even had the extremely big brain idea of running a Truffle interpreter for C for native extensions instead of actually compiling them to native code, just so that Truffle can optimize transparently across FFI boundaries. https://chrisseaton.com/truffleruby/cext/

MaxBarraclough · 2024-06-07T21:37:46.000000Z

I don't have anything to contribute to the Truffle discussion, but for those not familiar: Chris Seaton was an active participant on Hacker News, until his tragic death in late 2022. Wish he was still with us.

https://news.ycombinator.com/threads?id=chrisseaton

https://news.ycombinator.com/item?id=33893120

chc4 · 2024-06-07T21:54:22.000000Z

Yes, it's extremely sad :( He was a giant in the Ruby and Truffle communities, and TruffleRuby was a monumental work for both projects.

an-unknown · 2024-06-07T22:01:45.000000Z

> TruffleRuby even had the extremely big brain idea of running a Truffle interpreter for C for native extensions […]

TruffleC was a research project and the first attempt of running C code on Truffle that I'm aware of. It directly interpreted C source code and while that works for small self-contained programs, you quickly run into a lot of problems as soon as you want to run larger real world programs. You need everything including the C library available as pure C code and you have to deal with the fact that a lot of C code uses some UB/IB. In addition, your C parser has to fully adhere to the C standard and once you want to support C++ too because a lot of code is written in C++, you have to re-start from scratch. I don't know if TruffleC was ever released as open source.

The next / current attempt is Sulong which uses LLVM to compile C/C++/Rust/… to LLVM IR ("bitcode") and then directly interprets that bitcode. It's a lot better, because you don't have to write your own complete C/C++/… parser/compiler, but bitcode still has various limitations. Essentially as soon as the program uses handwritten assembler code somewhere, or if it does some low level things like setjmp/longjmp, things get hairy pretty quickly. Bitcode itself is also platform dependent (think of constants/macros/… that get expanded during compilation), you still need all code / libraries in bitcode, every language uses a just so slightly different set of IR nodes and requires a different runtime library so you have to explicitly support them, and even then you can't make it fully memory safe because typical programs will just break. In addition, the optimization level you choose when compiling the source program can result in very different bitcode with very different IR nodes, some of which were not supported for a long time (e.g., everything related to vectorization). Sulong can load libraries and expose them via the Truffle FFI, and it can be used for C extensions in GraalPython and TruffleRuby AFAIK. It's open source [1] and part of GraalVM, so you can play around with it.

Another research project was then to directly interpret AMD64 machine code and emulate a Linux userspace environment, because that would solve all the problems with inline assembly and language compatibility. Although that works, it has an entirely different set of problems: Graal/Truffle is simply not made for this type of code and as a result the performance is significantly worse than Sulong. You also end up re-implementing the Linux syscall interface in your interpreter, you have to deal with all the low level memory features that are available on Linux like mmap/mprotect/... and they have to behave exactly as on a real Linux system, and you can't easily export subroutines via Truffle FFI in a way that they also work with foreign language objects. It does work with various guest languages like C/C++/Rust/Go/… without modifying the interpreter, as long as the program is available as native Linux/AMD64 executable and doesn't use any of the unimplemented features. This project is also available as open source [2], but its focus somewhat shifted to using the interpreter for execution trace based program analysis.

Things that aren't supported by any of these projects AFAIK are full support for multithreading and multiprocessing, full support for IPC, and so on. Sulong partially solves it by calling into the native C library loaded in the VM for subroutines that aren't available as bitcode and aborting on certain unsupported calls like fork/clone, but then you obviously lose the advantage of having everything in the interpreter.

The conclusion is, whatever you try to interpret C/C++/… code, get ready for a world of pain and incompatibilities if you intend to run real world programs.

[1] https://github.com/oracle/graal/tree/master/sulong

[2] https://github.com/pekd/tracer/tree/master/vmx86

cxr · 2024-06-08T18:36:37.000000Z

> generally GraalVM is slower to start, but after a few iterations it can be as fast or faster

That's even true of the standard HotSpot-backed JVM. I've rewritten programs from Java to JS which has resulted in making them faster—because they're short-lived, and the JVM's slow startup chewing through the budget is never offset by any of the theoretical speedups that the JVM otherwise promises.

indolering · 2024-06-08T02:56:25.000000Z

> In my experience generally GraalVM is slower to start, but after a few iterations it can be as fast or faster

Probably having to do with the JVM being optimized for long-running server processes.

>the same results can be amplified if you use the JVM instead of AOT (i.e. it's even slower to start, but eventually it can be much faster.)

OpenJ9 has a caching JIT server that (theoretically) would work around this.

jacobp100 · 2024-06-07T21:47:10.000000Z

I’m assuming that’s a benchmark that runs a lot of hot code? I wonder why the start up is slower, if it’s based off an interpreter, and all current VMs start that way too

mike_hearn · 2024-06-08T16:09:09.000000Z

It's because:

• GraalJS is today based on an AST interpreter, which are less efficient in general than bytecode interpreters. V8 starts by compiling JavaScript to an internal bytecode, interpreting that, then JIT compiling the hot spots. It's the same architecture as the JVM except that the bytecode isn't considered to be a stable or documented format.

• The Truffle JIT compiler is slower than V8's because partial evaluation adds overhead. Slower compiler = more time spent in the interpreter = slower warmup.

• V8 is heavily optimized for great startup time because web pages typically don't live very long.

The first problem is being tackled by adding a bytecode interpreter infrastructure to Truffle itself. In other words, the Truffle library will invent a bytecode format for your language, write the bytecode interpreter for it, then partially evaluate that to create the JIT compiler! It can also handle stuff like persisting the bytecode to disk, like .pyc files or .class files do. Moving all the stuff needed to implement fast languages into the framework is very much the Truffle way.

The second problem is harder to solve. There is supposedly a thing called (I think) the second Futamura projection, where you partially evaluate the partial evaluator, but IIRC that's very hard to actually implement and for server-side use cases, less important.

kaba0 · 2024-06-08T09:48:35.000000Z

GraalVM’s interpreter, AST, etc is also written in Java, so there is a warm-up for these as well. Also, the memory representation has to be somewhat generic, and can’t be over-optimized for Javascript specifically. Add to it that JS has multiple tiers of interpreters, JIT compilers (I believe there are 3 at the moment?), all made specifically for JS, so it’s not really possible to compete with that here, while remaining general.