It's really irritating, as someone who works in C and C++ with good reason, to h...

com2kid · on Oct 7, 2013

> It's really irritating, as someone who works in C and C++ with good reason, to hear people continually deny the very real performance benefits of working at a lower level.

As someone who also works at a very low level, I also know the limits of precompiled optimizations!

In perfect theory land, a Sufficiently Smart JITTER will beat out a Sufficiently Smart Compiler, if for no other reason than the JITTER can always take advantage per CPU optimizations for CPUs newer than whenever precompiled code was compiled for! e.g. in theory code written ages ago gets free performance boosts.

JITTERs also have the benefit of knowing the state at run time. Doing things like only compiling code that is actually being used right now means in theory fitting more stuff throughout all layers of the memory subsystem, and we all know how important cache hit rates are to performance!

JITTERs also have access to the entire bytecode of a program, which lets it do even stranger optimizations if it so decides (again, sufficiently smart), where as a compiler cannot do much about libraries you link to dynamically (or even statically, doing optimizations on pure assembly is Not Fun)

Of course some compiler tools, such as Link Time Code Generation (also called Whole Program Optimization), and Profile Guided Optimizations can get you close to what a JITTER has by feeding the compiler a ton of additional data at compile time, but again all you have done is tried to give the compiler an approximation of what a JITTER already has available to it.

Now one thing C++ most certainly wins out on is that it is possible to create very thin lightweight wrappers around functionality, which will have huge perf gains in comparison to the multilevel abstractions that software engineers (myself included!) tend to enjoy creating when they get a hold of a VM based language (be it JVM, CLR, or pick your favorite bytecode).

illumen · on Oct 7, 2013

How can a JIT runtime be faster the first time code runs? How can it load faster if it has to load all the compilation code to run the process?

Yeah, each has their own benefits. AOT compilation is even theoretically faster in some situations, as well as being faster in practice.

vidarh · on Oct 7, 2013

> How can a JIT runtime be faster the first time code runs? How can it load faster if it has to load all the compilation code to run the process?

That depends on a lot of variables, such as where the code is loaded from, and IO capacity vs. CPU capacity, and code density. There's an interesting PhD thesis from ETH, back in 1994, on Semantic Dictionary Encoding (by now Dr. Michael Franz) that demonstrated a on-the-fly code generation system for Oberon where most or all of the code generation cost was covered by the reduced size of the executables on the then-current systems, which allowed loading the data from disk or network faster (the representation was in effect close to a compressed intermediate representation syntax tree, and was "uncompressed" by generating the code and reusing code fragments generated).

There's the difference between theory and practice though - I keep being disappointed every single time I try a Java based app. I don't know if it's the JVM or the compiler, or the language, or just the way Java developers write code, or if I'm just somehow fooling myself, but every single Java app I've used have felt horribly sluggish and bloated.

pron · on Oct 7, 2013

> every single Java app I've used have felt horribly sluggish and bloated.

It's the startup time (mostly compilation), compounded by the fact that Java loads classes lazily (so a class is loaded and compiled the first time you perform an action that uses it). Long-running Java server applications fly like the wind.

The JRE classes are, I believe, precompiled and saved in a cache. It is possible to add your own code to the cache to significantly reduce startup time.

BTW, the classes are not just compiled once. They're compiled, monitored, re-optimized and re-compiled and so on. It's quite possible for a Java app to take a couple of minutes before it reaches a steady state. Of course, loading more classes at runtime (or hot code swapping) can start the process again, as well as a change in the data that makes different optimizations preferable.

Sometimes, when going back to C, I'm amazed at how fast an application can start (I'm not used to that). But then I see performance after a few minutes and think, "damn, the current data dictates a certain execution path and all I have to rely on is the CPU's crappy branch prediction? where's my awesome JIT?"

rat87 · on Oct 7, 2013

Potentially it can save the jit results from the last execution.

pron · on Oct 7, 2013

There's a discussion in the Java community on how to do this best. Security is a problem. You need to make sure the compiled code matches the bytecode (that undergo security checks). But how do you compute a checksum for the compiled, cached code, if in order to test that checksum you need to re-generate it from the bytecode?

So Java caches the compiled code for the JRE classes only, and it's possible to add your own code to the cache (requires root permission, etc.)

kadaj · on Oct 7, 2013

Write JVM in Haskell, compile it and run natively.

pron · on Oct 7, 2013

I actually worked for years writing C/C++ software for the defense industry (including hard real-time and safety critical systems), and let me tell you that unless you write C++ in a very small team of experts, working months on micro-optimizations, your Java code will beat C++ nine times out of ten. We've since decided to switch to real-time Java even in our hard real-time systems and never looked back.

millstone · on Oct 7, 2013

My understanding is that real-time software optimizes for worst-case latency, even at the cost of making the overall program slower. That's pretty unusual, and most software prefers a different set of tradeoffs.

Consider a project like WebKit, which is definitely not comprised of a "very small team." Does anyone honestly believed that WebKit would be faster if it were written in Java?

pron · on Oct 7, 2013

No, but that's because WebKit has other requirements, like a short startup time as well as memory constraints. If WebKit were a long-running server process, a Java version could well be faster (although WebKit's most performance-critical code runs on the GPU anyway, where Java wouldn't have any advantage).

There are other issues as well (maybe I'll write a blog post about them): when it comes to throughput, Java is extremely hard to beat (i.e., it's certainly possible, but not without a lot of work); when it comes to latency, a Java project needs work, too; when it comes to a lot of concurrency, it's almost impossible to beat Java with C++, even with a lot of work (depending on the complexity of the concurrent code).

hershel · on Oct 7, 2013

If i'm not mistaken real-time java doesn't use the jit compiler(because it's pretty impossible to guarantee determinism with jit).

Is this true? And even with this constraint, java gets similar speed to c++ ?

pron · on Oct 7, 2013

Ah, the beauty of real-time Java is that it lets you mix hard real time code and soft- or non-realtime code in the same application, on different threads (and different classes).

Those classes designated as hard realtime will be compiled AOT, and will enjoy deterministic performance at the expense of sheer speed. Realtime threads are never preempted by JIT, or even by GC, as you run them within something similar to an arena memory region.

The idea is that only a small part of the application requires such hard deadline guarantees, and the rest should enjoy the full JVM treatment.

pjmlp · on Oct 7, 2013

> Write the next JVM in Java if you think it's that easy.

Jikes, http://jikes.sourceforge.net/

Oh and post Java 8, Hotspot might be replaced with Graal a new JIT done in Java,

https://wiki.openjdk.java.net/display/Graal/Publications+and...

haberman · on Oct 7, 2013

> Jikes, http://jikes.sourceforge.net/

That is not a JVM, that is a Java compiler. Surely you know the difference.

> Oh and post Java 8, Hotspot might be replaced with Graal a new JIT done in Java,

Good luck getting reasonable performance out of that.

But even if the JVM decides to go that way, a JIT is only one part of a VM. The interpreter and the GC will take particularly bad perf hits if you try to write them in Java.

lucian1900 · on Oct 7, 2013

I believe this is what was meant to be linked http://jikesrvm.org/

That's an entire JVM written in Java, with only a small bit of C to bootstrap the JIT.

pjmlp · on Oct 7, 2013

Correct, pasted the wrong link.

Thanks for pointing that out.

pjmlp · on Oct 7, 2013

A JVM does not require an interpreter, that is an implementation detail, not part of the language.

As for the GC, I can give the Squawk example which has a GC done in Java, targets embedded systems, with C/C++ only being used for hardware integration part.

http://www.sunspotworld.com/

https://java.net/projects/squawk/pages/Home

http://en.wikipedia.org/wiki/Squawk_virtual_machine

http://www.sunspotworld.com/docs/Yellow/javadoc/com/sun/squa...

http://dl.acm.org/citation.cfm?id=1094908

Developers should learn about compiler design and not mix languages with implementations.

haberman · on Oct 7, 2013

> As for the GC, I can give the Squawk example which has a GC done in Java, targets embedded systems, with C/C++ only being used for hardware integration part.

I didn't say it was impossible, just slow, and it is:

"My third reservation is Squawk's performance, which is roughly that of the J2ME-derived Java KVM introduced a few years ago, an interpreted JVM that is written in C. From everything I have learned about it, the developers are assuming that most applications for the SPOT platform will include processors of sufficient power and that large amounts of memory will be available for garbage collection, pointer safety, exception handling and a mature thread library for thread sleep, yield and synchronization. That is not true in many cases, and if SPOT is limited to only sensor applications that are not performance-constrained, the platform is interesting but not all that important in the long run."

http://www.embedded.com/electronics-blogs/cole-bin/4025677/S...

pjmlp · on Oct 7, 2013

It just needs to be fast enough to fulfill the required tasks, not a blazing thunder.

This is how C replaced Assembly in many use cases, C++ replaced C in many use cases, ...

vidarh · on Oct 7, 2013

A large part of the asm => C and C => C++ moves is that C and C++ very ,very strictly follow the principle that you pay only for what you use, and that pretty much all C and C++ environments makes inline assembler quite easy to do.

In other words: You can write C++ that is no slower than C, and you can write C that is only rarely slower than ASM, and in both cases, in the few situations where the performance difference is large enough to be noticeable and matter, you can still easily write asm.

This is also why C and C++ keeps being used in the face of so many higher level languages.

And I say this as someone who spends most of his time writing Ruby, and who haven't even bothered upgrading to MRI 2.x despite the performance improvements available.

In other words: I agree with your first line. But moving to C/C++ is a totally different tradeoff than moving to most higher level languages, most of which at the very least makes the performance characteristics much harder to predict.

pkolaczk · on Oct 7, 2013

> you pay only for what you use

This is only half of the equation. The other half is what are the costs of the things you do use. E.g. costs of exceptions, virtual calls, dynamic linking, dynamic memory allocation, concurrency, etc. Java is like an all-inclusive offer - you get more at a higher cost, probably with something you don't need, but often it is cheaper than to buy every service one by one.

pjmlp · on Oct 7, 2013

There are higher level languages like Modula-2 and Ada that offer type safety and once upon a time, had compilers with comparable performance to C, back when C was UNIX only.

Of course, 30 years of industry investment into C compiler backends changed this relation.

Unfortunately, thanks to Sun and Microsoft VM über alles attitude in the past decade, young developers tend to equate higher level languages with VMs and think C and C++ are the only languages with proper native code compilers.

derleth · on Oct 7, 2013

> you pay only for what you use

In assembly and C, but not C++, at least if you use exceptions:

http://yosefk.com/c++fqa/exceptions.html#fqa-17.1

Of course you can pare down C++ further and further, but then you have C with slightly different type-checking semantics and you might as well move to a language where 'new' isn't a reserved word.

swift · on Oct 7, 2013

I agree that the GC is a bad fit but I don't see any reason to believe that the interpreter is a particularly bad thing to write in Java. Can you elaborate on your reasoning?

haberman · on Oct 7, 2013

The usual suspects mostly (guards/bounds checks, excessive boxing/unboxing, hard to do clever things wrt. memory, etc), magnified because an interpreter is in the inner loop of an interpreted VM.

justincormack · on Oct 7, 2013

Interpreters are very hard to optimise automatically. A huge case statement for the whole language in effect. C compilers don't do that well, still often written in assembler.

vidarh · on Oct 7, 2013

But in this case they're talking about writing it in Java, and they have control over the Java compiler. In other words they have plenty of opportunities to ensure it optimises well:

- Focus on ensuring the overall pattern optimises well.

- Or recognising the special case of the JVM interpreter loop

- Or adding a pragma of some sort to trigger special optimisations.

Making it fast when compiled with an arbitrary compiler is another matter. But then even a lot of interpreters written in C often resorts to compiler specific hackery.

justincormack · on Oct 7, 2013

Recognising special cases is fragile. And the special optimisation may as well be "replace with this hand written code"... Compilers, well parts of them, are not really an example of the typical code you are trying to optimise.

pron · on Oct 7, 2013

First, nobody's telling you you're wasting time writing a kernel (or the JVM for that matter) in C++. Obviously, C/C++ has its place (kernel, drivers, VM, resource-constrained embedded systems). But a large Java application is usually more performant than the equivalent C++ code, especially if a lot of concurrency is involved.

Second, where it matters most, Java offers a similar level of access to the hardware (memory fences, CASs, etc.). There are areas where the JVM still needs improvement, though (mostly arrays of structs), but the cases where C++ is preferable to Java in large server-side application software are getting fewer by the day.

Third, there are performance benefits to working at a low level, but reaping them comes at a very high cost. Often you'll find that when working with a large team you need to make concessions on performance for the sake of maintainability. One example is virtual methods. They're good for maintainability and project structure, but bad for performance. Java solves this problem with its JIT. Another problem is memory allocation. Allocation schemes can get very complex in the face of concurrency (and damn hard if using lock-free DS). A C++ developer will usually prefer to tackle this with well defined allocation/deallocation policy and the possible use of mutexes; this harms performance. Java solves this problem with a good GC.

> while giving us stuff like Eclipse

Java on the client is far from optimal (although NetBeans beats most native IDEs I've seen, in terms of looks and performance). But you are completely unfamiliar with the entire Java landscape. Most Western defense systems (including weapon systems; including software that controls an aircraft carriers) are written in Java nowadays. Most banking systems and many high-frequency trading platforms, too.

* Unless one is working for a long time with a small team of experts on optimizing said C++ code.

hermanhermitage · on Oct 7, 2013

This is what happens when an important variable (the specific problem domain of performance comparison) is left as a free variable :-)

waps · on Oct 7, 2013

I think you'll find java's propensity to box everything to effectively mean that beating it is rather easy in pretty much any application that actually uses some memory.

As will java's insistence to do everything in UTF-16.