Interfacing with native methods on Graal VM

kjeetgill · on July 7, 2018

Awesome. I wonder how well this works on a stock JDK10 using graal.

Whenever I see a speed boost to do what is conceptually the same thing I'm always curious where the fat was cut. What did we give up? You can dump the resulting assembly with -XX:+UnlockDiagnosticVMOptions -XX:+PrintAssembly and diff might be revealing.

My hunch is that the line from the tutorial: `@CFunction(transition = Transition.NO_TRANSITION)` makes all the difference. Explanation of NO_TRANSITION from [0]:

No prologue and epilogue is emitted. The C code must not block and must not call back to Java. Also, long running C code delays safepoints (and therefore garbage collection) of other threads until the call returns.

Which is probably great for BLAS-like calls. This lines up with my understanding from Cliff Click's great talk "Why is JNI Slow?"[1] basically saying that to be faster you need make assumptions about what the native code could and couldn't do and that generally developers would shoot themselves in the foot.

[0]: https://github.com/oracle/graal/blob/master/sdk/src/org.graa... [1]: https://www.youtube.com/watch?v=LoyBTqkSkZk

Twirrim · on July 7, 2018

A team I was on in the past had a well known bottleneck for performance on the most performance critical component. It was one that couldn't possibly be avoided or minimised. It was one called with high frequency, and wall clock wise, didn't take too long.

"JNI is slow", being the conventional wisdom, and knowing just how frequent the calls would be, people had ignored it as an option.

Randomly one of the devs who was most bothered by the bottleneck, had an hour spare and threw the conventional wisdom out the window and dropped in JNI calls to an standard (highly optimised) library and re-benchmarked. 40% performance boost. Further experiments found that "JNI is slow" isn't as true as conventional wisdom quite had it.

pjmlp · on July 7, 2018

ART on Android also has some annotations (@FastNative) for that, but they are forbidden outside system code given that they are quite unsafe.

https://android.googlesource.com/platform/libcore/+/master/d...

EDIT: I forgot to mention @CriticalNative as well

https://android.googlesource.com/platform/libcore/+/master/d...

chrisseaton · on July 7, 2018

Yes I think `Transition.NO_TRANSITION` uses the new FFI, GNFI, or Graal Native Function Interface, described here [0].

[0]: https://dl.acm.org/citation.cfm?id=2500832

Reason077 · on July 7, 2018

Back in the day, GCC's Java native compiler "GCJ", had an alternative native method interface called CNI.

GCC recognized #extern "Java" in headers generated from class files. You could then call (gcj-compiled) Java classes from C++ as if they were native C++ classes, as well as implement Java "native" methods in natural C++.

The whole thing performed a lot better than JNI since it was, more or less, just using the standard platform calling conventions. Calling a native CNI method from Java had the same overhead as any regular Java virtual method call.

Ultimately, GCJ faded away because there wasn't a great deal of interest in native Java compilation back then, and too many compatibility challenges in the pre-OpenJDK days. But it's interesting to see many of it's ideas coming back now in the form of Graal/GraalVM.

pjmlp · on July 7, 2018

There was interest in native Java compilation, not in doing the work for free.

Most third party commercial Java SDKs do have support for native compilation, specially on the embedded space.

Around 2009 GCJ suffered an exodus of developers to OpenJDK.

repolfx · on July 7, 2018

There's an effort to bring a more modern FFI to Java that works similar to the one described in the article, called project Panama. It has tools to convert C header files into the equivalent annotated Java definitions and is intended to help improve performance as well.

You can follow along here:

http://mail.openjdk.java.net/pipermail/panama-dev/

The same project is also adding support for writing vector code in Java (SSE, AVX etc).

agibsonccc · on July 7, 2018

Disclaimer: I'm affiliated with a semi competing project to panama called javacpp: https://github.com/bytedeco/javacpp

I can say for a fact that panama is not seriously targeting this space. We implement a ton of that native code today that works with c++ and actual android today. We also handle gpus. Project panama is only targeting c, and even then will only do it a cross platform non committal fashion. They aren't doing it the way they should be in order to properly target native vectorized code.

We know this from experience, because this is all we do: https://github.com/deeplearning4j/deeplearning4j https://github.com/bytedeco/javacpp-presets

We tried seeing if we could get some of this work in to the JDK, but their goals fundamentally compete with what it takes to get vector math to be fast. It's also not nearly as ambitious as it needs to be to handle real world tensor workloads.

bitmapbrother · on July 7, 2018

>Project panama is only targeting c, and even then will only do it a cross platform non committal fashion

John Rose of Oracle:

Panama is not just about C headers. It is about building a framework in which any data+function schema of APIs can be efficiently plugged into the JVM. So it's not just C or C++ but protocol specs and persistent memory structures and on-disk formats and stuff not invented yet. We've been relentless about designing the framework down to essential functionality (memory access and procedure calls), not just our (second-)favorite language or compiler.

The important deliverable of Panama is therefore not Posix bindings, but rather a language-neutral memory layout-and-access mechanism, plus a language-neutral (initially ABI-compliant) subroutine invocation mechanism. The jextract tool grovels over ANSI C (soon C++) schemas and translates to the layouts and function calls, bound helpfully to Java APIs with unsurprising names. But the jextract tool is just the first plugin of many.

We do look forward to building more plugins for more metadata formats outside the Java ecosystem, such as what you are building.

In fact, I expect that, in the long run, we will not build all of the plugins, but that people who invent new data schemas (or even data+function schemas or languages) will consider using our tools (layouts, binder, metadata annotations) to integrate with Java, instead of the standard technique, which is to write a set of Java native functions from scratch, or (if you are very clever) with tooling. The binder pattern, in particular, seems to be a great way to spin repetitive code for accessing data structures of all sorts, not just C or Java. I hope it will be used, eventually, in preference to static protocol compilers. The JVM is very good at on-line optimization, even of freshly spun code, so it is a natural framework for building a binder.

>They aren't doing it the way they should be in order to properly target native vectorized code.

Which is interesting since Intel is the one contributing the majority of the vector code changes.

agibsonccc · on July 8, 2018

Yes that's what I stated above. I've also stated that I haven't just read the news. We've talked to that team physically. Being language/platform neutral does not mean it is going to fulfill most use cases people would have for c bindings. Java tends to be "good enough" for a lot of use cases out of the box. It might help a bit with libraries like netty and memory management, but it's not going to work on real world math code which, as I stated, is our main use case.

That codegen isn't going to match what you need to do for real speed on cpus or gpus when writing vectorized math code.

Re: his last point. That's exactly what we talked to that team about. We don't feel those tools are going to work for real world use cases. We already do the codegen and auto bindings/mapping ourselves in addition to the memory management ourselves.

needusername · on July 7, 2018

I don't know, it looks as if you have to hardcode pointer sizes in the source code.

https://twitter.com/sundararajan_a/status/101507363642677248...

chrisseaton · on July 7, 2018

Where are you seeing that? The pointer in that example doesn't have a hardcoded size.

needusername · on July 7, 2018

u64 in the annotation value

Check out this document http://cr.openjdk.java.net/~mcimadamore/panama/panama-binder... for the syntax.