First impressions about Graal VM

kasperni · on May 4, 2018

For those not reading the comment section:

Regarding the example benchmark: It is mainly testing the performance of the String::compareToIgnoreCase method, which has a very different implementation in Java 9 and beyond compared to Java 8 (due to the "compact strings" feature). The method generally is not very complex, so it is not ideal for compiler comparisons.

...

The largest workload tested on Graal is currently the production use of Graal at Twitter where they get somewhere between 12% (CE) and 24% (EE) improvement.

merb · on May 4, 2018

I actually checked Play! on akka-http on Graal (maybe I write a Blog about it). Just the "Hello World" of Play! and I've seen like 25% improvements in request/s with CE. I think a full Scala Codebase get the most out of Graal. However running Play dev mode with Graal EE didn't worked well it sometimes had strange Class issue's where it tried to search a "java.lang.Boolnae" class (notice the misspelling)

I also looked into what else Graal is capable of and it's cool to run C programs or even curl on top of it. Sadly I wished the NODE.JS integration would've been better. I wished I could run angular universal on top of Graal within Play! so that Play! would've been able todo the same thing than .net core with it's angular integration (just without running node!). However Graal can only run "pure" JavaScript (https://github.com/graalvm/graaljs/issues/2).

(native-image does not work with Scala 2.12 (Scala 2.12 emit's indy which does not work with SubstrateVM) or even Play! on 2.11 did not work)

threeseed · on May 5, 2018

You can use ScalaJS.

https://github.com/greencatsoft/scalajs-angular

ksec · on May 5, 2018

What are the difference between CE and EE, I thought the VM were the same it is more of a toolchain options?

linsomniac · on May 4, 2018

I played with it a few weeks ago, mostly interested in Python and the article I read and the Graal home page didn't make it clear that the Python support is not even close. I tried a "print 'hello world'", and that didn't work at all. sys.stdout.write did work. I then did a simple test of startup time and Python was an order of magnitude faster than Graal.

The Graal Python prints a warning when you start it that it is a very early state.

So I tried Ruby, and I was able to do a simple print, and the startup time for Graal vs Ruby was about the same.

Looks like Java is the real target right now. A very interesting project, looking forward to seeing where it goes.

kjeetgill · on May 5, 2018

I've been playing with graalpython this last week too. The python implementation is in its early stages. I could get a "hello world" working with both their binary builds and from source but not much else.

Also: Try "print('hello world')". It's targeting python3

pron · on May 4, 2018

Re the startup time: native images are intended to reduce startup time of Java code, as they AOT compile Java. Other languages are not, AFAIK, AOT-compiled, and so reducing startup time is not a goal -- overall performance is.

chrisseaton · on May 4, 2018

We get this confusion every time we talk about native compilation. GraalVM can do ahead-of-time compilation of Java code. That Java code could be an interpreter for JavaScript, written in Java. But GraalVM cannot do ahead-of-time compilation of JavaScript. We struggle to always make this clear.

And it's actually more complicated than that:

Of course GraalVM can compile JavaScript to native code, it just only does it at runtime, as a JIT.

GraalVM can run JavaScript during ahead-of-time compilation, so you can run your JavaScript program so far, and then compile it at that point with all its state compiled into the image. We use this to do things like pre-initialise libraries ahead-of-time so when you start to run everything is set up and ready to go.

Finally, the mathematical trick we use to compile JavaScript to native code at runtime, which is called partial evaluation, or more formally, the First Futamura Projection, has a theoretical more advanced variant called the Second Futamura Projection, which could be implemented in GraalVM to, indeed, compile JavaScript ahead-of-time to native code. But implementing this practically is an open problem and not really our goal.

yunglono · on May 4, 2018

How do you people get this much knowledge about programming and maths?

quelltext · on May 5, 2018

I wish people stopped using niche terms as a) if they were common knowledge, and b) as if they were crucial to the discussion at hand. It's much more intuitive to explain what's going on and then say that what is described is also known as ___.

The Futamura Projections can be explained in about three lines and is not at all a complex concept to wrap your head around.

Casually inserting these terms in a forum where most people likely haven't had much exposure to partial evaluation terminology makes it sound like you want to either forcefully polularize these terms or make yourself sound more important.

...not saying you intended either.

_5nsu · on May 5, 2018

I think you are wrong about this. Contrary to a normal conversation, specific vocabulary used on the internet can be instantly looked up. This allows for diving more into the topics if they interest you.

igouy · on May 4, 2018

> So I tried Ruby

fwiw https://benchmarksgame-team.pages.debian.net/benchmarksgame/...

chrisseaton · on May 4, 2018

Thanks for adding those to the Benchmarks Game - I think it was you?

If you are interested in very short running programs, then these benchmarks can be interesting to look at.

If you are more interested in long running programs, like web servers, then I don't think you'll find many professional virtual machine implementors who will agree that this is a valid way to benchmark things.

But we probably also have some optimisation bugs to work out still. There's even some errors in the logs of those.

igouy · on May 4, 2018

> very short running

Let's be clear that we're talking about minutes not milliseconds.

djur · on May 5, 2018

I think Truffle has a pretty good showing there that matches your description -- quick scripts don't get much help but longer-running ones have some pretty incredible improvements.

igouy · on May 5, 2018

> …quick scripts … longer-running ones…

What's "quick" and "long running" in CPU secs on some machine?

What language implementation are we using as a baseline when we say "quick" and "long running"?

Otherwise someone might well say that 8 minutes with TruffleRuby is "long running" and CRuby 2.5 makes "some pretty incredible improvement" over that :-)

suyash · on May 4, 2018

This is a nice video where Twitter engineer discusses performance improvement with Graal. Twitter runs thousands of JVM and found massive savings in time and cost with the move https://www.youtube.com/watch?v=pR5NDkIZBOA

ksec · on May 5, 2018

Here is a more recent one.

https://www.youtube.com/watch?v=_7yIUkP5LiQ

snarfy · on May 5, 2018

I've been thinking about this a lot - binary interfaces.

We have this problem of one standard binary interface, the C ABI. It is the only standard binary interface. If I want to write a library that can be used by any language, I have to export a C interface.

There have been solutions, such as Corba and COM, but cross language support never really took off. We end up with ports of libraries. There are tons of libraries ported between C++, Java, C#, etc. All because they live in their own world. Using a C++ library in C# requires exporting a C interface and then hand wrapping the library with pinvoke.

Before I read about GraalVM I was under the impression that WebAssembly would accomplish this.

ScottBurson · on May 5, 2018

So, does it have tail calls yet?

chrisseaton · on May 5, 2018

Yes you can implement tail calls using exceptions.

keymone · on May 5, 2018

what would performance look like?

chrisseaton · on May 5, 2018

If you throw and catch an exception in the same compilation unit it just turns into a jump instruction, so fine.

ScottBurson · on May 6, 2018

If I understand what you're saying, you're suggesting that a single self-tail-recursive function can be rewritten using an exception thrown and caught within the function.

That's something, but what about a set of mutually tail-recursive functions? Of course one can always use a trampoline, but then you have to pay the cost of consing thunks. It would still be worthwhile to have true tail-call instructions in the bytecode.

Veedrac · on May 6, 2018

My understanding is that Graal's compilation units are basically an amalgamation of all of the hot code in an area; function boundaries don't really come into it in the same way they do in C. If you're reliably tail recursing, that's going to be in the same compilation unit. This image might make it clearer:

https://www.researchgate.net/profile/Chris_Seaton/publicatio...

chrisseaton · on May 6, 2018

Truffle frames are just Java objects, so when the exception returns us to the root of the method, we can create a new frame to run an entirely different method.

We aren't looking to modify Java, so we aren't the people to ask for a tail-call instruction in the bytecode.

ScottBurson · on May 6, 2018

Oh, you're heap-allocating frames? You know what I'm going to ask about next, right? First-class continuations :-)

keymone · on May 5, 2018

Didn’t know that, neat!

mpweiher · on May 4, 2018

TL;DR: for the author's test, Java 9 had better performance and creating native code didn't work. So not quite there yet, but interesting to watch.

chrisseaton · on May 4, 2018

Yes there's some reported bugs.

We build entire JavaScript, Ruby, and Python interpreters using the native code generator, so we do know it works for non-trivial applications and libraries. I've also used it to do things like compile the third-party Apache SIS geospatial library for use from a native application without issue.

The author's tried just one benchmark - there are many other benchmarks and use-cases. Twitter report it's around 11% faster for their real and extremely large codebase, so we also know for other people it's significantly faster.

jrq · on May 4, 2018

Is it frustrating answering these types of questions? If I was a plumber, I'd get tired of people complaining about their WiFi not working.

That must be a nice benefit of being part of a large corporation is being somewhat more delineated from the hype cycle of innovations like this.

"Yes ma'am I understand your router keeps rebooting, but I made it so that you can flush your toilet now."

arjo129 · on May 5, 2018

Out of curiosity, dynamic class loading seems like one of those things where you would need to embed an interpreter in the application itself. So how does one deal with such cases?

cdrtz · on May 5, 2018

Dynamic class loading is not supported on SubstrateVM because there is no infrastructure built-in to parse bytecodes, interpret them, etc. Adding those in SubstrateVM would defeat the purpose of having a thin layer VM. But, you are right: you could have an interpreter in the application itself. However a simple bytecode interpreter would have very bad performance. In fact that's how Truffle languages run on SubstrateVM. As far as SubstrateVM is concerned the Graal/Truffle stack is an application, albeit an application that knows how to interpret/compile other languages. So an interesting exploration path is having a Truffle based Java implementation that could be embedded in the image on demand, and which could load classes dynamically. You could still compile your known Java classes AOT and defer the dynamic classes to the interpreter. You might pay the extra cost for memory footprint, code interpretation/warmup/compilation time, etc., but you would be able to slowly migrate more Java code from dynamic compilation into AOT.

finchisko · on May 5, 2018

Can somebody please explain me in plain english, how this is different from vert.x? Even Graal seems to be OSS, I would rather use Eclipse foundation project (vert.x) than anything from Oracle.

amelius · on May 5, 2018

Are there any scientific papers by the Graal team?

chrisseaton · on May 5, 2018

http://www.graalvm.org/community/publications/

Veedrac · on May 4, 2018

> Numbers speak for themselves

No, microbenchmarks never speak for themselves.

jcelerier · on May 4, 2018

What if the only code you run is microcode ? IF 60-70% of a given run-time ends up in the same few small dozen functions, it's certainly worth benchmarking them.

chrisseaton · on May 4, 2018

One problem is that testing these functions in isolation and with synthetic data is not the same thing as testing them in your real program and with real data. Modern JIT compilers like Graal are extremely sophisticated and will look at your program in ways that are not easy to understand.

Even just preventing a powerful compiler like Graal from optimising away your microbenchmark is going to be a challenge - Graal currently defeats the state-of-the-art JMH for example, and most people wouldn't even going as far as to use JMH.

I have many examples of microbenchmarks that Graal (when being used as a JIT for Ruby in my case) will optimise away that people will find very surprising.

https://twitter.com/chrisgseaton/status/619885182104043520

Veedrac · on May 4, 2018

By no means do I think microbenchmarks convey no useful information. My point is that they are never useful data in isolation, because the only time you can act on them is if you know why they are behaving the way they are. In this case it seems like the only thing being measured was String::compareToIgnoreCase, and the difference had nothing to do with the optimizer. In other cases the cause will be something else.

I wrote a blog post in response to someone's attempt to investigate different implementations of a small function through a sequence of microbenchmarks without asking why. I go into detail about a bunch of ways they ended up being wrong because they thought the numbers spoke for themselves. This only gets worse when talking about JITs, since their behaviour is even less local.

https://medium.com/@veedrac/learning-the-value-of-good-bench...