> It's really irritating, as someone who works in C and C++ with good reason, to hear people continually deny the very real performance benefits of working at a lower level.
As someone who also works at a very low level, I also know the limits of precompiled optimizations!
In perfect theory land, a Sufficiently Smart JITTER will beat out a Sufficiently Smart Compiler, if for no other reason than the JITTER can always take advantage per CPU optimizations for CPUs newer than whenever precompiled code was compiled for! e.g. in theory code written ages ago gets free performance boosts.
JITTERs also have the benefit of knowing the state at run time. Doing things like only compiling code that is actually being used right now means in theory fitting more stuff throughout all layers of the memory subsystem, and we all know how important cache hit rates are to performance!
JITTERs also have access to the entire bytecode of a program, which lets it do even stranger optimizations if it so decides (again, sufficiently smart), where as a compiler cannot do much about libraries you link to dynamically (or even statically, doing optimizations on pure assembly is Not Fun)
Of course some compiler tools, such as Link Time Code Generation (also called Whole Program Optimization), and Profile Guided Optimizations can get you close to what a JITTER has by feeding the compiler a ton of additional data at compile time, but again all you have done is tried to give the compiler an approximation of what a JITTER already has available to it.
Now one thing C++ most certainly wins out on is that it is possible to create very thin lightweight wrappers around functionality, which will have huge perf gains in comparison to the multilevel abstractions that software engineers (myself included!) tend to enjoy creating when they get a hold of a VM based language (be it JVM, CLR, or pick your favorite bytecode).
> How can a JIT runtime be faster the first time code runs? How can it load faster if it has to load all the compilation code to run the process?
That depends on a lot of variables, such as where the code is loaded from, and IO capacity vs. CPU capacity, and code density. There's an interesting PhD thesis from ETH, back in 1994, on Semantic Dictionary Encoding (by now Dr. Michael Franz) that demonstrated a on-the-fly code generation system for Oberon where most or all of the code generation cost was covered by the reduced size of the executables on the then-current systems, which allowed loading the data from disk or network faster (the representation was in effect close to a compressed intermediate representation syntax tree, and was "uncompressed" by generating the code and reusing code fragments generated).
There's the difference between theory and practice though - I keep being disappointed every single time I try a Java based app. I don't know if it's the JVM or the compiler, or the language, or just the way Java developers write code, or if I'm just somehow fooling myself, but every single Java app I've used have felt horribly sluggish and bloated.
> every single Java app I've used have felt horribly sluggish and bloated.
It's the startup time (mostly compilation), compounded by the fact that Java loads classes lazily (so a class is loaded and compiled the first time you perform an action that uses it). Long-running Java server applications fly like the wind.
The JRE classes are, I believe, precompiled and saved in a cache. It is possible to add your own code to the cache to significantly reduce startup time.
BTW, the classes are not just compiled once. They're compiled, monitored, re-optimized and re-compiled and so on. It's quite possible for a Java app to take a couple of minutes before it reaches a steady state. Of course, loading more classes at runtime (or hot code swapping) can start the process again, as well as a change in the data that makes different optimizations preferable.
Sometimes, when going back to C, I'm amazed at how fast an application can start (I'm not used to that). But then I see performance after a few minutes and think, "damn, the current data dictates a certain execution path and all I have to rely on is the CPU's crappy branch prediction? where's my awesome JIT?"
There's a discussion in the Java community on how to do this best. Security is a problem. You need to make sure the compiled code matches the bytecode (that undergo security checks). But how do you compute a checksum for the compiled, cached code, if in order to test that checksum you need to re-generate it from the bytecode?
So Java caches the compiled code for the JRE classes only, and it's possible to add your own code to the cache (requires root permission, etc.)
As someone who also works at a very low level, I also know the limits of precompiled optimizations!
In perfect theory land, a Sufficiently Smart JITTER will beat out a Sufficiently Smart Compiler, if for no other reason than the JITTER can always take advantage per CPU optimizations for CPUs newer than whenever precompiled code was compiled for! e.g. in theory code written ages ago gets free performance boosts.
JITTERs also have the benefit of knowing the state at run time. Doing things like only compiling code that is actually being used right now means in theory fitting more stuff throughout all layers of the memory subsystem, and we all know how important cache hit rates are to performance!
JITTERs also have access to the entire bytecode of a program, which lets it do even stranger optimizations if it so decides (again, sufficiently smart), where as a compiler cannot do much about libraries you link to dynamically (or even statically, doing optimizations on pure assembly is Not Fun)
Of course some compiler tools, such as Link Time Code Generation (also called Whole Program Optimization), and Profile Guided Optimizations can get you close to what a JITTER has by feeding the compiler a ton of additional data at compile time, but again all you have done is tried to give the compiler an approximation of what a JITTER already has available to it.
Now one thing C++ most certainly wins out on is that it is possible to create very thin lightweight wrappers around functionality, which will have huge perf gains in comparison to the multilevel abstractions that software engineers (myself included!) tend to enjoy creating when they get a hold of a VM based language (be it JVM, CLR, or pick your favorite bytecode).