JIT gives you more information about how the program is actually run. This allows a large number of optimistic optimizations that GCC etc... can not make.
This in theory allows faster code to be generated. Than ahead of time compilers can. In practice this is destroyed by the pointer chasing default datastructure in java.
If you are executing "C" using JVM techniques then a research JVM JITing C code gets within 15% of GCC on a reasonable set of benchmarks. (http://chrisseaton.com/plas15/safec.pdf).
These benchmarks are favorable for the AOT GCC as they have few function pointers, startup configuration switches and little dataflow variety.
I suspect that specific compiler optimizations on the C code for the dense matrix transform in GCC are bigger part of why GCC is faster than the fact that it is AOT instead of JIT.
There are also a number of AOT compilers for Java e.g. excelsior. But I have not seen superior performance with that over an equivalent hotspot release.
> JIT gives you more information about how the program is actually run. This allows a large number of optimistic optimizations that GCC etc... can not make.
They'd be surprised to learn that considering GCC supports profile-guided optimisation.
Modern JITs basically are very sophisticated profile-guided optimizing compilers. Unlike AOT PGOs, though, JITs can adapt to changing program phases, that are common in server applications (e.g. when certain monitoring/troubleshooting/graceful-degradation features are turned on). On the whole, JITs can produce better code.
But this, too, is a tradeoff. Some JITs (like JS's in the browser) require fast compilation, so some optimizations are ruled out. Very sophisticated optimizations require time, and therefore are often useful when you have tiered compilation (i.e. a fast JIT followed by an optimizing JIT), and tiered compilation is yet another complication that makes JITs hard to implement.
An AOT compiler can decide to compile a JIT code generator into the binary, so even in theory AOT beats JIT. In practice it almost never makes sense to do this for a language that isn't full of gratuitous dynamism.
> An AOT compiler can decide to compile a JIT code generator into the binary, so even in theory AOT beats JIT
That is not AOT. That is compiling some code AOT and some JIT[1]. It's not a contest. Those are two very common compilation strategies, each with its own pros and cons.
> In practice it almost never makes sense to do this for a language that isn't full of gratuitous dynamism.
What you call "gratuitous dynamism" others call simple, general abstractions. BTW, even Haskell/ML-style pattern matching qualifies as "dynamism" in this case, as this is something a JIT optimizes just as it does virtual dispatch (the two are duals).
[1]: Also, a compiler doesn't generally insert huge chunks of static code into its output. That's the job of a linker. What you've described is a system comprised of an AOT compiler, a JIT compiler, and a linker that links the JIT compiler and the AOT-compiled code into the binary. Yeah, such a system is at least as powerful as just the JIT component alone, but such a system is not called an AOT compiler.
JITs could only optimize pattern matching if one particular branch is picked all the time, but not statically knowable. That is a very niche case.
Really, in almost all cases where JITs are effective it's because of a language where you have to use a dynamic abstraction when it should have been a static one. The most common example being virtual dispatch that should have been compile time dispatch. Or even worse, string based dispatch in languages where values are semantically string dictionaries (JS, Python, Ruby). A JIT is unnecessary in languages with a proper distinction between compile time abstractions and run time constructs. Names should be a purely compile time affair. Instantiating an abstraction should happen at compile time in most cases. There goes 99% of the damage that a JIT tries to undo, along with a lot of the damage that a JIT doesn't undo. Sadly most languages lack these abilities. Rust is one of the few languages that has a sensible story for this, at least for the special case of type abstraction, traits, and lambdas. I'm still hoping for a mainstream language with full blown staged programming support, but I guess it will take a while.
> JITs could only optimize pattern matching if one particular branch is picked all the time, but not statically knowable.
They would optimize it if one particular branch is picked all (or even most of) the time but not statically knowable at any given (extended) call site, and by "extended" I mean up to any reasonable point on the stack. That is not a niche case at all. Think of a square-root function that returns a Maybe Double. At most call sites you can optimize away all matchings, but you can't do it statically. Statically-knowable is always the exception (see the second-to-last paragraph).
> Names should be a purely compile time affair.
That doesn't work too well with dynamic code loading/swapping... But in any case, you're now arguing how languages should be designed. That's a whole other discussion, and I think many would disagree with you.
> There goes 99% of the damage that a JIT tries to undo, along with a lot of the damage that a JIT doesn't undo.
At least theoretically that's impossible. The problem of program optimization is closely related to the problem of program verification (to optimize something you need proof of its runtime behavior), and both face undecidable problems that simply cannot be handled statically. A JIT could eliminate any kind of branching at any particular site -- be it function dispatch, pattern matching or a simple if statement -- and eliminating branches opens the door to a whole slew of optimizations. Doing branch elimination statically is simply undecidable in many, many cases, regardless of the language you're using. If you could deduce the pertinent information for most forms of (at least theoretical) optimization, you've just solved the halting problem.
Whether or not what you say is true in practice depends on lots of factors (like how far languages can take this and still be cheaply-usable, how sophisticated JITs can be), and my bet is that it would end up simply being a bunch of tradeoffs, which is exactly where we started.
How much of the compiled code that runs day to day is shipped with a PGO build? How many packages are compiled at O2 instead of O3 in the average linux distribution.
Even then you assume the profile resembles the real program input. Not always true.
So GCC supports PGO but nearly no one uses it. On the JIT side at least all programs runing on the common JVMs and .NETs use PGO.
Even with PGO, one can't make all the optimizations one can in a managed runtime. e.g. null pointer exceptions -> dereferencing a null pointer is handled in the rare case in the JVM by segfaulting and dealing with it in a signal handler (after 3x this code will be deoptimized and turned into an if null check). This is an approach GCC cannot take as it cannot install signal handlers, so at best it can do the check and mark the segfault path as unlikely.
> How much of the compiled code that runs day to day is shipped with a PGO build? How many packages are compiled at O2 instead of O3 in the average linux distribution.
That's irrelevant to my objection. Also O3 isn't necessarily a gain over O2, so it doesn't make sense to blanket-compile everything as O3.
> Even then you assume the profile resembles the real program input. Not always true.
That is at least somewhat relevant, but your original comment stated that GCC can not make usage-based optimisations, not that it won't always make the right ones.
> Even with PGO, one can't make all the optimizations one can in a managed runtime. e.g. null pointer exceptions -> dereferencing a null pointer is handled in the rare case in the JVM by segfaulting and dealing with it in a signal handler (after 3x this code will be deoptimized and turned into an if null check). This is an approach GCC cannot take as it cannot install signal handlers, so at best it can do the check and mark the segfault path as unlikely.
GCC can do one better, given dereferencing null pointers is illegal it doesn't need to handle that at all, and thus has no need to optimise something which doesn't exist.
>> Even then you assume the profile resembles the real program input. Not always true.
>That is at least somewhat relevant, but your original comment stated that GCC can not make usage-based optimisations, not that it won't always make the right ones.
There are things that PGO can not optimize away that an optimistic JIT+managed runtime can.
> GCC can do one better, given dereferencing null pointers is illegal it doesn't need to handle that at all, and thus has no need to optimise something which doesn't exist.
Unfortunately, null pointer deference do exist even if their behavior is "implementation specific" in C, segfaults are depressingly common. And if you take CCJ as an AOT compiler then GCC as compiler collection can not make that optimization.
I think this points to a philosophical in difference between JVM compilers and GCC+ICC compilers. The JVM optimizes things away because behavior is defined in a certain way, while GCC tends to say you are asking for undefined behavior so I don't bother to do what you asked at all.
So I would not call it do one "better, it really is do one "worse"