Regarding the example benchmark: It is mainly testing the performance of the String::compareToIgnoreCase method, which has a very different implementation in Java 9 and beyond compared to Java 8 (due to the "compact strings" feature). The method generally is not very complex, so it is not ideal for compiler comparisons.
...
The largest workload tested on Graal is currently the production use of Graal at Twitter where they get somewhere between 12% (CE) and 24% (EE) improvement.
I actually checked Play! on akka-http on Graal (maybe I write a Blog about it). Just the "Hello World" of Play! and I've seen like 25% improvements in request/s with CE. I think a full Scala Codebase get the most out of Graal.
However running Play dev mode with Graal EE didn't worked well it sometimes had strange Class issue's where it tried to search a "java.lang.Boolnae" class (notice the misspelling)
I also looked into what else Graal is capable of and it's cool to run C programs or even curl on top of it. Sadly I wished the NODE.JS integration would've been better. I wished I could run angular universal on top of Graal within Play! so that Play! would've been able todo the same thing than .net core with it's angular integration (just without running node!). However Graal can only run "pure" JavaScript (https://github.com/graalvm/graaljs/issues/2).
(native-image does not work with Scala 2.12 (Scala 2.12 emit's indy which does not work with SubstrateVM) or even Play! on 2.11 did not work)
I played with it a few weeks ago, mostly interested in Python and the article I read and the Graal home page didn't make it clear that the Python support is not even close. I tried a "print 'hello world'", and that didn't work at all. sys.stdout.write did work. I then did a simple test of startup time and Python was an order of magnitude faster than Graal.
The Graal Python prints a warning when you start it that it is a very early state.
So I tried Ruby, and I was able to do a simple print, and the startup time for Graal vs Ruby was about the same.
Looks like Java is the real target right now. A very interesting project, looking forward to seeing where it goes.
I've been playing with graalpython this last week too. The python implementation is in its early stages. I could get a "hello world" working with both their binary builds and from source but not much else.
Re the startup time: native images are intended to reduce startup time of Java code, as they AOT compile Java. Other languages are not, AFAIK, AOT-compiled, and so reducing startup time is not a goal -- overall performance is.
We get this confusion every time we talk about native compilation. GraalVM can do ahead-of-time compilation of Java code. That Java code could be an interpreter for JavaScript, written in Java. But GraalVM cannot do ahead-of-time compilation of JavaScript. We struggle to always make this clear.
And it's actually more complicated than that:
Of course GraalVM can compile JavaScript to native code, it just only does it at runtime, as a JIT.
GraalVM can run JavaScript during ahead-of-time compilation, so you can run your JavaScript program so far, and then compile it at that point with all its state compiled into the image. We use this to do things like pre-initialise libraries ahead-of-time so when you start to run everything is set up and ready to go.
Finally, the mathematical trick we use to compile JavaScript to native code at runtime, which is called partial evaluation, or more formally, the First Futamura Projection, has a theoretical more advanced variant called the Second Futamura Projection, which could be implemented in GraalVM to, indeed, compile JavaScript ahead-of-time to native code. But implementing this practically is an open problem and not really our goal.
I wish people stopped using niche terms as a) if they were common knowledge, and b) as if they were crucial to the discussion at hand. It's much more intuitive to explain what's going on and then say that what is described is also known as ___.
The Futamura Projections can be explained in about three lines and is not at all a complex concept to wrap your head around.
Casually inserting these terms in a forum where most people likely haven't had much exposure to partial evaluation terminology makes it sound like you want to either forcefully polularize these terms or make yourself sound more important.
I think you are wrong about this. Contrary to a normal conversation, specific vocabulary used on the internet can be instantly looked up. This allows for diving more into the topics if they interest you.
Thanks for adding those to the Benchmarks Game - I think it was you?
If you are interested in very short running programs, then these benchmarks can be interesting to look at.
If you are more interested in long running programs, like web servers, then I don't think you'll find many professional virtual machine implementors who will agree that this is a valid way to benchmark things.
But we probably also have some optimisation bugs to work out still. There's even some errors in the logs of those.
I think Truffle has a pretty good showing there that matches your description -- quick scripts don't get much help but longer-running ones have some pretty incredible improvements.
What's "quick" and "long running" in CPU secs on some machine?
What language implementation are we using as a baseline when we say "quick" and "long running"?
Otherwise someone might well say that 8 minutes with TruffleRuby is "long running" and CRuby 2.5 makes "some pretty incredible improvement" over that :-)
This is a nice video where Twitter engineer discusses performance improvement with Graal. Twitter runs thousands of JVM and found massive savings in time and cost with the move https://www.youtube.com/watch?v=pR5NDkIZBOA
I've been thinking about this a lot - binary interfaces.
We have this problem of one standard binary interface, the C ABI. It is the only standard binary interface. If I want to write a library that can be used by any language, I have to export a C interface.
There have been solutions, such as Corba and COM, but cross language support never really took off. We end up with ports of libraries. There are tons of libraries ported between C++, Java, C#, etc. All because they live in their own world. Using a C++ library in C# requires exporting a C interface and then hand wrapping the library with pinvoke.
Before I read about GraalVM I was under the impression that WebAssembly would accomplish this.
If I understand what you're saying, you're suggesting that a single self-tail-recursive function can be rewritten using an exception thrown and caught within the function.
That's something, but what about a set of mutually tail-recursive functions? Of course one can always use a trampoline, but then you have to pay the cost of consing thunks. It would still be worthwhile to have true tail-call instructions in the bytecode.
My understanding is that Graal's compilation units are basically an amalgamation of all of the hot code in an area; function boundaries don't really come into it in the same way they do in C. If you're reliably tail recursing, that's going to be in the same compilation unit. This image might make it clearer:
Truffle frames are just Java objects, so when the exception returns us to the root of the method, we can create a new frame to run an entirely different method.
We aren't looking to modify Java, so we aren't the people to ask for a tail-call instruction in the bytecode.
We build entire JavaScript, Ruby, and Python interpreters using the native code generator, so we do know it works for non-trivial applications and libraries. I've also used it to do things like compile the third-party Apache SIS geospatial library for use from a native application without issue.
The author's tried just one benchmark - there are many other benchmarks and use-cases. Twitter report it's around 11% faster for their real and extremely large codebase, so we also know for other people it's significantly faster.
Out of curiosity, dynamic class loading seems like one of those things where you would need to embed an interpreter in the application itself. So how does one deal with such cases?
Dynamic class loading is not supported on SubstrateVM because there is no infrastructure built-in to parse bytecodes, interpret them, etc. Adding those in SubstrateVM would defeat the purpose of having a thin layer VM. But, you are right: you could have an interpreter in the application itself. However a simple bytecode interpreter would have very bad performance. In fact that's how Truffle languages run on SubstrateVM. As far as SubstrateVM is concerned the Graal/Truffle stack is an application, albeit an application that knows how to interpret/compile other languages. So an interesting exploration path is having a Truffle based Java implementation that could be embedded in the image on demand, and which could load classes dynamically. You could still compile your known Java classes AOT and defer the dynamic classes to the interpreter. You might pay the extra cost for memory footprint, code interpretation/warmup/compilation time, etc., but you would be able to slowly migrate more Java code from dynamic compilation into AOT.
Can somebody please explain me in plain english, how this is different from vert.x? Even Graal seems to be OSS, I would rather use Eclipse foundation project (vert.x) than anything from Oracle.
What if the only code you run is microcode ? IF 60-70% of a given run-time ends up in the same few small dozen functions, it's certainly worth benchmarking them.
One problem is that testing these functions in isolation and with synthetic data is not the same thing as testing them in your real program and with real data. Modern JIT compilers like Graal are extremely sophisticated and will look at your program in ways that are not easy to understand.
Even just preventing a powerful compiler like Graal from optimising away your microbenchmark is going to be a challenge - Graal currently defeats the state-of-the-art JMH for example, and most people wouldn't even going as far as to use JMH.
I have many examples of microbenchmarks that Graal (when being used as a JIT for Ruby in my case) will optimise away that people will find very surprising.
By no means do I think microbenchmarks convey no useful information. My point is that they are never useful data in isolation, because the only time you can act on them is if you know why they are behaving the way they are. In this case it seems like the only thing being measured was String::compareToIgnoreCase, and the difference had nothing to do with the optimizer. In other cases the cause will be something else.
I wrote a blog post in response to someone's attempt to investigate different implementations of a small function through a sequence of microbenchmarks without asking why. I go into detail about a bunch of ways they ended up being wrong because they thought the numbers spoke for themselves. This only gets worse when talking about JITs, since their behaviour is even less local.
Regarding the example benchmark: It is mainly testing the performance of the String::compareToIgnoreCase method, which has a very different implementation in Java 9 and beyond compared to Java 8 (due to the "compact strings" feature). The method generally is not very complex, so it is not ideal for compiler comparisons.
...
The largest workload tested on Graal is currently the production use of Graal at Twitter where they get somewhere between 12% (CE) and 24% (EE) improvement.