The Future of Compiler Optimization

sb · on Aug 19, 2010

Excellent post. Aside from the very interesting and important (hence many PLDI publications) work of Sorin Lerner, there is a workshop which deals with the first point of Prof. Regehr: Compiler Optimization meets Compiler Verification, which is a satellite workshop for ETAPS. So for interested people, this might very well be the point to start digging for relevant publications.

CountHackulus · on Aug 19, 2010

Surprisingly, compilers already do much of these optimization, particularly profile directed feedback (basically a static JIT), and the mixing of high-level and low-level optimizations.

Now while, LLVM may not do this, it's far from the only compiler on the block; especially in the non-open-source world.

The IR argument is a non-argument. Certain optimizations work better with certain IRs than others, that's just the way it is. It's unlikely that there will be one to rule them all. More likely, we'll convert between them to use the optimal one.

Superoptimizers are an interesting case, it's kind of like having a database of optimality. While it may not give much speedup compared to time invested, if you're desperate for that last little bit of performance, it's a great tool. Plus it should make itself faster over time. Could even locate it in the cloud, such that a compiler send its hottest methods over to a remote machine, and it'll use its massive database of optimality to optimize it as best as possible.

Verifiable transformations are great in theory, but it's true, it's extremely hard to implement currently.

The author definitely has a point about AOT/JIT compilers being the future of optimization. Static compilers have been done to death, while dynamic compilers are still (comparatively) new and fresh.

ohyes · on Aug 19, 2010

The interesting thing about the peephole superoptimizer paper that he linked is that the actual peephole optimization process is not necessarily slower than a regular peephole optimizer, as all it is doing is looking up faster equivalent instructions in a database.

The difficult (compute intensive) part is training the database, and you only have to do that once for each optimization...

I am a little confused by why he stated that the results of the super optimizer were not very impressive. Presumably, man-years have been invested in defining optimizations for compilers like GCC and the intel C compiler, by human experts...

To have a machine that even gets close to those results is impressive to me. If the technique were used for an actual compiler, it would mean that no one would have to write peephole optimizations, instead, the compiler would just get better and better as it ages.

Combined with aggressive in-lining (or maybe dynamic recompilation? e.x. lisp style function compilation rather than C), it seems to me that this sort of technology could be incredibly powerful. Am I wrong?

beza1e1 · on Aug 20, 2010

Superoptimizers are good at finding complex combinations of mini optimizations, but often optimization is finding creative solutions to lots of special cases. Machine learning does not help you with creating more mini optimizations.

ced · on Aug 19, 2010

Now while, LLVM may not do this, it's far from the only compiler on the block; especially in the non-open-source world.

Non-open-source compilers don't get much press around here. Could you point out the most interesting ones?

larsberg · on Aug 19, 2010

Visual C++ has had things like profile-guided optimization for over a decade internally to MSFT and nearly that long shipping externally (http://blogs.msdn.com/b/vcblog/archive/2008/11/12/pogo.aspx).

In general, the Microsoft and Intel C++ compilers have pretty incredible performance on the x86 platform. The last time I talked to the Intel folks, they didn't really consider GCC to be competitive. It's possible things have changed in the last five years on that front; apologies if I'm misrepresenting the state of GCC optimization quality.

The PGI Fortran folks are pretty incredible in the parallel space.

On embedded platforms, most serious companies seem to buy the Green Hills C++ compiler.

One of the program-management types from the Visual C++ team could probably do this much better justice than my quickly-fading memories. But, I think the HN crowd would be quite shocked by the market share that commercial compilers have, even on *NIX platforms.

gaius · on Aug 19, 2010

It is not unusual to get 2x performance with icc vs gcc. But of course, their priorities are different. Sun's SPARC compilers are (were?) also excellent. The benefits of targetting only one architecture and developing the compiler in close collaboration with the hardware designers.

gcc is "free" but is it free enough to double the bill for hardware? Benchmark and find out...

beza1e1 · on Aug 20, 2010

I have never seen such a speed difference. Can you reference anything?

gaius · on Aug 20, 2010

Benchmarks we did at my last company (trading app in C++). But like I say, try it for yourself. There's a reason people still pay $$$ for compilers when GCC is free!

beza1e1 · on Aug 20, 2010

Well, i am doing compiler research, which means testing is mostly SPEC CPU. Sometimes i wonder if those measurements show reality, because every serious compiler is heavily optimized for these special cases. There is no 2x difference there.

acqq · on Aug 21, 2010

You should definitely measure something from the real life.

I measure something as simple as a loop in which some short floating point calculations is made (actually only two additions per loop and one FP comparison) and my measurements on the latest Intel i5 give (in seconds):

0.31 VS2010 -O2 SSE2

0.35 VC 6 -O2

0.44 cygwin gcc-4.3.2 -O2

0.95 cygwin gcc-3.4.4 -O2

I haven't tested the later 4.x gcc but the results are clear -- even VC 6, 12 years old, is better than a quite recent gcc, at least for FPU calculations.

MatzeB · on Aug 23, 2010

gcc 3.X is also really old by now. Apart from that you should really use -O3 -march=native -msse2 -fomit-frame-pointer for best gcc performance. Anyway at some point you have to look at the generated assembly to really know what is going on.

astrange · on Aug 25, 2010

-march=native implies -msse2 (when possible). You might have meant -mfpmath=sse. -fomit-frame-pointer is probably not the default in VC6, so that might be unfair.

(It actually is the default for linux and darwin now, because DWARF unwind tables obsolete frame pointers for most uses.)

gaius · on Aug 20, 2010

Calculating yield curves is pretty "real world", as is transactions/sec. If the benchmark is representative, then it is a good benchmark, and if the compiler is optimized for it, then it is optimzed for real world use cases too.

sb · on Aug 19, 2010

AFAIR from a course at university, the status quo (about five years ago) was that gcc did not implement many of the advanced optimization techniques, such as IPS (integrated pre-pass scheduling). I have not checked myself, just promulgating--albeit competitive--hearsay...

astrange · on Aug 19, 2010

gcc and llvm don't have a technique with that name, but it sounds a little like an implementation detail to me. What is it prior to and/or integrated with?

gcc supports profile-guided optimization just fine, llvm has some code for it but I'm not sure if it's hooked up. Neither of them use iterative techniques for optimization - they're already too slow as it is for most people, anyway.

sb · on Aug 19, 2010

IPS means that instruction scheduling is performed before register allocation, but with register allocation kept in mind to reduce register pressure.

There was some other stuff, but I cannot remember (I am actually glad to have been able to come up with IPS.)

astrange · on Aug 19, 2010

See http://gcc.gnu.org/ml/gcc-patches/2009-09/msg00003.html.

Instruction scheduling of any kind doesn't really help on x86 anyway, and register pressure is usually surprisingly good already (since temporary values are moved close to their uses when combining instructions). I think the most important thing missing thing is rematerialization - recalculating values instead of saving them on stack would save a lot of memory loads.

ced · on Aug 19, 2010

I wonder, which compilers does Google use?

beza1e1 · on Aug 19, 2010

I think he emphasizes the joining of verification and optimization too much. No doubt verification may get easier, if they use the IR and all analysis info the compiler already did. SSA makes this stuff easier for example.

Just tell me what optimization could learn from verification? Everything verification does is too slow to do within a compiler. Sure, you can produce better code, if you are allowed to use NP algorithms, but in reality i have never seen any algorithm more complex than O(n^2) in a compiler. Usually O(nlogn) is the most one can afford.

JoachimSchipper · on Aug 19, 2010

A deeper understanding of the program - which is what verification brings to the table - may be useful; even if it is too slow for most code, it can occasionally be used to detect bugs/speed up the most critical parts.

It would also be rather useful, although probably quite a bit of work, to have a "verify the optimizations you did" mode. While most errors aren't compiler errors, gcc has been known to optimize bugs into programs...

namin · on Aug 19, 2010

Arguably, types are a form of verification, and they already lead to better optimizations.

beza1e1 · on Aug 20, 2010

Types were invented for optimization and used for verification later, weren't they?

ced · on Aug 19, 2010

What is the most advanced compiler, across all high-level (not C/C++) languages? The GHC?

beza1e1 · on Aug 19, 2010

What do you mean by "most advanced"?

* LLVM is probably the Open-Source compiler which fits your intention most.

* GCC has man-years of efforts for little bit/byte tweaks, which no other compiler does. This is not advanced, but tedious.

* libFirm is the only compiler which does not deconstruct SSA form. This is advanced in a mostly academic sense.

* GHC is probably the best functional (CPS) compiler.

* Sun JVM is probably the best Hotspot (online) compiler.

* I'm not sure which compiler is "most advanced" for dynamically typed code. Some Smalltalk or Javascript compiler? PyPy?

sb · on Aug 19, 2010

I think at least worth mentioning in this list is the milepost project:

* http://ctuning.org/wiki/index.php/CTools:MilepostGCC

Smalltalk JITs are pretty darn good, e.g., the Strongtalk project is known to be pretty fast. As of 2006, Sun released the Strongtalk source code. Its publication record is relatively weak (aside of type system related publications), but it contains a wealth of relevant optimizations and I am sure the interested reader/programmer will find something valuable in there. (http://www.strongtalk.org)

Eliot Miranda has been implementing Smalltalk VMs for quite a while, and I think his recent addition ("Cog") to the Squeak implementation is probably the most recent addition to JIT compilers for Smalltalk VMs. Given his in-depth experience and expertise (particularly with inline caching), this could probably serve as a blueprint for other (Smalltalk) JITs.

V8 for Javascript is supposedly very fast (interesting side information: Robert Griesemer is working on V8, but worked on the Strongtalk interpreter before), but I don't know about the involved benchmarks, and how they stack up against each other--particularly since the TraceMonkey trace-based JITs came along.

Mike Pall's LuaJIT is a very interesting project (only one-man JIT project I know of), too.

PS: I am sorry for the overly long post...

beza1e1 · on Aug 20, 2010

Milepost Gcc is interesting. Regehr does not believe in it, though: "machine learning in compilers will not end up being fundamental."

His argumentation is flawed though. He says "I could always get a good optimization result by running all of my optimization passes until a fixpoint is reached," but unfortunately there is no such fixpoint. Many optimizations reverse each other (e.g. loop fusion vs loop spitting) or just arbitrarily choose some normalization (e.g. 2*x vs x+x vs x<<1).

You can build a superoptimizer, which constructs all variations (e.g. equality saturation http://portal.acm.org/citation.cfm?doid=1480881.1480915), though this is no fixpoint search, but an optimization problem to choose the least cost alternative. You can not construct all variations anyway. For a simple example consider loop unrolling an infinite loop.

Hence, unlike Regehr I would not devalue machine learning. I would not bet on it either, though. ;)

jules · on Aug 19, 2010

The most advanced dynamically typed code compiler is either Self or Stalin.

borisk · on Aug 19, 2010

Intresting - http://en.wikipedia.org/wiki/Stalin_(Scheme_implementation)

When do we expect Hitler, Churchill and Roosevelt?

jules · on Aug 19, 2010

"Stalin brutally optimizes."

ced · on Aug 19, 2010

The question was intentionally vague. I'm looking for interesting threads of thought. Thanks for your post.

libFirm is the only compiler which does not deconstruct SSA form

I thought that continuation-passing-style compilers didn't use SSA, and that most Scheme implementations were CPS. No?

beza1e1 · on Aug 20, 2010

SSA and CPS is equivalent: http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.3.67...

In SSA form most operations (add,sub,...) are side-effect free and the rest (load,store,call,...) can be understood as using something like a "Memory Monad".

ori_b · on Aug 19, 2010

From what I've heard, probably MLton.

ced · on Aug 19, 2010

Can you explain why? Neither their webpage nor Wikipedia goes much beyond "produces fast executables".

jules · on Aug 19, 2010

I found this description of the passes and intermediate languages used in mlton: http://mlton.org/CompilerOverview

Another candidate for the most advanced compiler is Stalin.