Surgical Precision JIT Compilers [pdf]

fiatmoney · on March 31, 2014

At one point the programmer was expected to regularly override the compiler, replacing sections with hand-tuned assembly for maximum throughput, based on their knowledge of the CPU architecture, hand-profiling, & expecting the calling pattern.

Then the programmer was expected mostly to trust the compiler, which either applies massive, unwieldy (for a human) optimizations at compile time, or develops new optimizations on the fly based on JIT profiling data, or both.

There has been not nearly enough research on cooperating with the compiler, and providing information that's useful for developing specific optimizations, doesn't force potentially unproductive optimizations if that information is wrong, and still represents the information at a high-enough level that it's human-comprehensible.

stormbrew · on April 1, 2014

I think that the register and volatile keywords in C and C++ and the inline keyword in C++ basically constitute real world research into exactly that process. In pretty much all cases it seems to have been eventually determined that the combination of the compiler needing to override the programmer's wishes when the programmer was wrong (as well as the fact that it will sometimes implement the optimization without being asked) and the programmer's inability to recognize cases where the optimization will actually make things worse has led to them becoming less and less relevant to the compiler's decisions over time. Usually this results in a "YES I REALLY MEAN IT" compiler directive, which just looks like an arms race.

I suppose it's possible this has been the wrong direction, but I suspect it was effectively correct.

frozenport · on April 1, 2014

The restrict keyword can be used assist in vectoring your code, which is equivalent to inserting the correct binary instructions and is almost never the wrong strategy

stormbrew · on April 1, 2014

You could quite easily wind up throwing restrict on something that does actually get aliased, which would result in some weird heisenbugs. That said it may be true that restrict will do better than the others have, but it's a lot 'newer' (even though it came from a 15 year old revision of C it seems like adoption of it has been particularly slow) so I'm not sure the jury's out yet.

jeffreyrogers · on March 31, 2014

I'd be interested to see how much of a difference cooperating with the compiler makes. I think the usual advice that until you profile you don't know what the bottlenecks in your code are applies here. Similarly, I think that it is unlikely that the places where humans could add anything of value to the compiler have much of an impact on actual performance (with the exception of types which I'll discuss below).

And in most cases the compiler is much better at optimizing than humans. Register allocation is probably the archetypical example of this. C has a register keyword that the compiler is supposed to interpret to mean "try to put this variable in a register". In most cases you aren't going to be able to perform a liveness analysis of your variables better than the compiler can, and I'm not even sure that modern C compilers even pay attention to the register keyword. They might just perform their own optimizations since compiler writers realize that in >99% of cases, that's going to be better anyways.

Lastly, to touch briefly on types, this is one area that does make a difference on runtime performance. Basically, if the compiler can type check code it can perform more aggressive operations because it can rule out certain classes of errors. Additionally, static types don't have the overhead associated with dynamic typing in terms of tagging variables with their type information. Functional languages in particular are nice from a compiler writer's perspective because if you can guarantee certain parts of the source code are pure you can perform more optimizations that you could otherwise. For example, let's say you're using C++ and you have a function that doesn't return anything, doesn't take any arguments, and doesn't update any globals. A Haskell compiler could probably eliminate this function call, since it doesn't seem to be doing anything, but in C++ it's most likely doing I/O or is being called for some other side effect and so we have to keep it around.

teoryn · on March 31, 2014

Aliasing information is one place the programmer can help the compiler. The 'restrict' keyword in C/C++ is a great example of this. Also information provided by 'const' can be very helpful for the compiler. While both of these aren't strictly hints because they limit what you can do and still get correct results, they do tell the compiler facts that would be otherwise difficult or impossible to prove.

jeffreyrogers · on March 31, 2014

'const' is a very good example and one I hadn't thought about. To some extent functional languages pass every function argument with const, since the arguments are call by value, rather than call by reference. At first glance this might seem to be disastrous in terms of performance if, for example, you pass in a list with 1,000,00 entries or something like that, but any reasonable implementation of a functional language provides ways of doing this without the excessive space and time costs that a naive implementation would have.

stormbrew · on April 1, 2014

I'm not sure any compiler actually implements any real optimizations based on const. I suppose possibly on static const variables of basic integer types, but that'd be about all it can do. The existence of the volatile escape hatch, to say nothing of const_cast<>, makes it a pretty useless hint for the compiler.

Someone · on March 31, 2014

In C(++), the compiler may be able to figure that out. If the function doesn't, directly or indirectly, call any outside function, the entire call tree can be thrown away.

And that can cross libraries: if you #include a standard header, a C++ compiler may assume that calling, say, malloc() or exit() does what the standard says it does. That way, even a function that calls malloc or gettimeofday can be optimized away before it is even linked.

And of course, in many compilers, programmers can use #pragma's or attributes to instruct the compiler "you can assume this function is pure/doesn't return/returns newly allocated memory/etc (http://gcc.gnu.org/onlinedocs/gcc/Function-Attributes.html)

If you do that meticulously, the compiler has way more room for optimizations.

Solarsail · on March 31, 2014

You could argue compilers are good at making transformations to a program for optimization, but they are in many ways information starved. Cooperating with a compiler in the way http://www.yosefk.com/blog/humans-and-compilers-need-each-ot... describes, implies giving the compiler more information, external to the information that can be gathered from the source code. Type information should be helpful (tho disclaimer: I've never written a compiler), but it's information the compiler already has... That's not really cooperating with it. Just handing it stuff.

mikeash · on March 31, 2014

Regarding the "register" keyword, I'm pretty sure I was reading books in the 90s which described it as a historical curiosity that was ignored by all modern, right-thinking compilers.

tiarkrompf · on March 31, 2014

(author here) Yes, one of our points is that 'register' is much too low-level, because one can't abstract over it. The same holds e.g. for OpenMP directives. In the paper we propose (among other things) to attach such directives to dynamic scopes. This makes it possible to e.g. unroll all innermost loops, including those in function calls, without putting unrolling directives on individual loops.

tiarkrompf · on March 31, 2014

Author here. Thought I should share the github link as well: https://github.com/TiarkRompf/lancet

erichocean · on April 1, 2014

Have you read the papers on Tick-C (`C)? MIT, mid 90s. Pretty interesting stuff.

p4bl0 · on March 31, 2014

I thought of Futamura projections [1] while reading the abstract, and saw that they indeed talk about it in the paper. If you don't know what it is yet, be sure to check it out, since it is also quite interesting and fun.

[1] http://cs.au.dk/~hosc/local/HOSC-12-4-pp381-391.pdf

dang · on March 31, 2014

Bravo! More of this, please.

It is my great sorrow that I have no time to read these excellent things. The rest of you, please do it justice.

pjmlp · on March 31, 2014

I am looking forward to see Graal eventually replace Hotspot, but so far it is only a possible Java 9 feature.

pron · on March 31, 2014

Is it a replacement for HotSpot or an add-on?

pjmlp · on March 31, 2014

A replacement. Graal is the next generation of the Maxime JIT.

Maxime is/was a meta-circular JVM done at Sun research labs.

Since Oracle bought Sun, Maxime project changed into Graal and there have been a few presentations where the goal to replace Hotspot has been hinted a few times.

There are a few Oracle presentations where it is listed as possible Java Next (8+) feature, but without any kind of commitment when.

Zigurd · on March 31, 2014

I don't know if the authors would agree, but Dalvik's JIT compiler appears to be a "precision" JIT. The goals for Dalvik's JIT compiler were to speed up apps without reducing battery life by focusing on compiling only the parts of code that will deliver the most performance increase.

One can also suppose that, with more memory and more power-efficient processors, Google has calculated that in real-world cases, ART's pre-compiler delivers more benefits in performance dimensions than it costs in space and power dimensions.

tiarkrompf · on March 31, 2014

"precision" JIT: I'm not sure -- can you tell it what to do? More predictable performance model than HotSpot: absolutely.

dochtman · on March 31, 2014

This reminded me of PyPy, where (I think) you write an interpreter, than add what they call JIT hints to the interpreter to help it do a good job of JITting the interpreted code.

DasIch · on March 31, 2014

That's different though in that in the case of PyPy you are writing an interpreter and get a JIT that's generated and takes those hints into account.

The authors of programs that are being interpreted by that interpreter are unable to provide any information to the JIT.

cpeterso · on March 31, 2014

Christian Wimmer, an Oracle researcher work works on the Graal and Tuffle projects, gave a talk at Mozilla last year. A video of the talk is available here:

https://air.mozilla.org/one-vm-to-rule-them-all/