I've had this argument a few times before. My company is full of javaheads and kept claiming that Java could be faster than C for some specific workloads.
My argument has always been and still is...a language x written in language y cannot by definition be faster than language y.
I understand that languages by themselves aren't fast, but it seems that point should be clear.
> language x written in language y cannot by definition be faster than language y
That’s just.. false. Languages compile to machine code, one way or another. Java’s JIT compiler doesn’t create C code, it creates machine code. So it could very well be written in brainfuck, it could still be theoretically better than C.
Agreed. As an existence proof, the optimising compiler for language X could realise that a for() loop that uses 95% of the actual runtime CPU has no side effects and can be removed. There may be some feature or property of language Y that makes that analysis impossible.
In the Java case, the JIT performs a runtime analysis. Without wanting to making any assertions about overall performance here, there exists a non-zero set of programs for which runtime optimisation is superior to static compiler analysis.
I can second this. C# has had AOT and JIT compilation for years, and the JIT is almost always faster. It is really hard to predict where to optimize, devirtualize and inline if you don’t know the runtime behavior. (Note AOT != ngen, I am talking about the AOT developed to Microsoft’s mobile efforts)
Static compilers can do PGO as well. In practice though, the benefits of PGO in languages like C or C++ that rely on static dispatch are small enough that most people don't care, except for programs that are extremely performance sensitive (compilers, browsers, AAA games etc) where shaving off a few % can make a difference.
What JVM wins in PGO, it loses in other areas, and the end result is often still much slower.
But if my understanding is correct, PGOs don’t help much with “speculative optimizations” — eg. Java can almost always elide virtual calls when only one actual class is loaded implementing an interface, reverting back to virtual calls on class load.
No matter PGO these and similar examples can’t be done without self-modifying code, while java with JIT is sort of that in a way.
You are wrong. The JVM can do runtime optimisations that you won't get from ahead of time compiled code. Trivial example: removal of runtime constants. That doesn't mean that code running in the JVM is always faster - most of the time it won't be. But it is possible for particular workloads to run faster than in C
What if the runtime for language X (written in Y) rewrites X programs into the same language Z that Y itself compiles to? Yeah, there are up-front costs, so you're still technically correct, but I suspect this isn't something you had in mind.
It is what I had in mind. Again, I'm being pendantic. If language X written in Y can rewrite to superfast Z, then you can... in language Y, write all of language X to mimic results. Sure it's not realistic, but neither are language comparisons.
If you really believed that argument you wouldn't write C, because it's slower than assembly.
Do you actually write C programs that profile themselves and tune their code according to the input data? If not, then the fact that it's theoretically possible to do that seems pretty irrelevant.
Theoretically, your point stands. But practically, it's not a valid conclusion unless you're saying that it's practical to say rewrite all 27M lines of C code in the linux into optimized assembly for multiple cpu architectures. Or even the various x86-64 families.
I'm not sure how valid this comparison is, but PyPy (written in Python) is well known for being a much faster implementation of Python than the main implementation (written in C).
The problem is that different languages do different things at runtime and this is part of the definition of the language. C has almost no runtime, while the JVM is allowed to recompile code at runtime using profiling data it has collected that very run. This is how it can beat C in certain workloads.
This seems a bit wrong-headed. All code ultimately runs CPU instructions, regardless of the language a program is written in. It doesn’t matter if the Java compiler or runtime is written in C, because CPUs don’t run C. The Java bytecode compiler might be written in C, but it ultimately emits CPU instructions.
When talking about performance you say that “there's literally nothing Java can do that C cannot because C can write Java” but other than the tautology that they are both Turing machines, your conclusion doesn’t follow.
The Java runtime dynamically optimises based on the currently executing code, but C compiles to static CPU instructions. A C program is statically optimised so it’s possible that some optimisations in C will perform worse than those in Java because Java knows more about the actual code paths taken at runtime, and this can change between executions.
For C to be able to optimise itself the same way that Java does, it would need its own heavyweight runtime and bytecode, like Java does (as I think you suggest). But C doesn’t actually have such a runtime, and although I’m certainly no expert on the C language spec, I assume it has all sorts of runtime guarantees and a memory model that would preclude arbitrarily rearranging the executable code at runtime, so it arguable that the resulting system wouldn’t actually be “C” if it did this.
Put another way: although Java is/was written in C, the way Java works is fundamentally different to C, and because of this, certain optimisations are available to Java that are not available to C.
(All of that said, in my experience C code is always faster than Java code, these days I use Go and Swift, and I’m very happy to have seen the back of the JVM.)
This is what I used to think about JIT as well. That JIT would have more information than a static compiler to optimise the code better. However, it seems to me that most of the performance depends on how well we can exploit the cache lines and how many fewer instructions the cpu has to execute to get the job done. JIT could perhaps improve on the latter but in many cases it seems to me that it's generally very hard to be cache aware when programming in Java not to mention every user-defined type is allocated on heap. So, I feel like it's quite unlikely that there can be Java programs that outperform C and even then that it can't in principle be done better in C.
Yeah I don’t disagree with you, I was really trying to dispel the idea that somehow the implementation language puts an upper limit on the performance of the implemented language.
So, consider a perfectly spherical Java language… :)
Actually, Java gets ridiculously close to C performance when you only do computations on primitives. Of course you do have some pointer chasing in most code bases but in my experience cache is mostly relevant for repetitive workloads. I’m not sure that a typical web application would win all that much from a C rewrite.
Java is not being rewritten to C though. It is being rewritten to assembly (by a C++ program).
----
Also, by the same argument you're using, git should be faster if written in assembly; since assembly can do more than C can.
The tradeoff between performance and easiness to write the program has already been made by writing git in C. The only difference in using Java is a different point in that tradeoff.
I'd also contend that Java gives you much more bang for the buck in that tradeoff than C. Java is stupidly easy to write, especially if you want to do multithreading. With C you'd have to start writing some form of primitive GC, or have the disciplined paranoia of Rust's borrow checker yourself.
If I write an assembler in C, I could use it to write object files with fancy machine code that the C compiler would never choose. Same with HotSpot, it’s not limited by the code generator it was bootstrapped from.
“Does my C compiler generate better code than HotSpot?” is a different question than “could I conceivably write a better code generator from scratch, in any language?”
Theoretical max performance is different from actual performance - and since all code on a given machine ends up in machine code all languages are the same speed - which is obviously false.
But not all languages emit the same machine code. And I think performance is more sensitive to memory access than machine instructions, so I don't think this point is true.
The point is that a language can be faster than the language it is written in - all or for the most part - depending on how it is used. Saying "the theoretical performance of Java can never be faster than the theoretical performance of C" is a tautology and pretty useless - especially if in practice the Java libraries handle normal operations better than the average C programmer.
Ok I see. I still find that this kind of argument relies on hidden assumptions. By this argument we are saying that Java doesn't actually exist, and it's all C with a fancy DSL. :-)
You could maybe write some kind of translator in C to emit JVM bytecode, and have it recompile itself at runtime. The problem with that line is that it is a lot of work and then to be fair we should allow a similar effort to be applied to a JVM like HotSpot for an apples-to-apples comparison.
Then remember the original thesis, which is that we claimed Java was only faster for specific usecases. That is a very wiggly claim that I bet could nearly always be found true.
It actually really runs slower as is it first an interpreter, and the compiler is yet to be optimized to GCC levels; but is has a great compiler infrastructure that could eventually JIT to faster code (it being a runtime compilation, that has statistics, and Graal being capable of partial evaluation)
I was merely pointing at a hole in your argument: the language of the compiler has absolutely no influence in the quality of it's output (again: the JVM compiles to assembly, not to C++).
A compiler written in brainfuck could write code that is more performant than gcc does.
The language does have an influence on compilation speed, though. But that matter in your argument?
That's not the statement you made. A safer statement would be, "a simple interpreter for language x, implemented in language y, cannot be faster than language y".
My statement is that language speed comparisons are stupid, but if you're willing to make one, you cannot beat the language you're written in. That's like saying Shakespeare writes better stories than...English.
Even if that position turns out to be correct, the argument is deeply flawed. It hinges on a "by definition" where the is no definition. That hand-wave is unsound. Java/The JVM has access to runtime performance information about how code is used that a compiler doesn't have access to. There is no reason it can't outperform C. More information available might lead to better results, and sometimes does.
> Did you also implement object pooling for the Java variant (commonly used in high perf apps)?
In the specific case I don't think you need to; I've seen generated code (from java sources) simply reuse an object in a tight loop. IOW, it doesn't allocate new memory for the instance within a loop, for each invocation of the loop. The memory for the instance is allocated once and then reused.
(For a small allocation (a small instance) I would expect a smart compiler to not allocate anything and simply create the for-loop instance on the stack).
The optimization you are getting at has not much to do with object size, but subsequent usage. If the object reference escapes, it has to be allocated on the heap. Value semantics could/will help here.
> The optimization you are getting at has not much to do with object size, but subsequent usage.
Size plays a part: it determines whether or not an instance first gets allocated on the heap or the stack[1]. Heap allocation gets expensive in a tight loop.
> If the object reference escapes, it has to be allocated on the heap. Value semantics could/will help here.
The assumption is that we are talking about local-only data objects (not returned or outliving the scope). Forgive (and correct) me if I am under the incorrect assumption.
[1] I'd expect a smart compiler to do this: a data object that requires 1MB should at no point be on the stack, while a data object that requires 32 bytes has no business starting the allocator, causing a context switch to the kernel that faults a new page. The specific thresholds are dependent on the runtime and OS support.
For these sorts of micro-benchmarks you can usually attain interesting results with Java. I once constructed specifically designed to show how the JVM's ability to online and un-inline code could make certain programs much faster than C.
> My argument has always been and still is...a language x written in language y cannot by definition be faster than language y.
While this may be true in theory, in practice it is often not true as usually there are time constraints.
Working in a low level language Y can take much more time & effort than a high level language X. This can mean you can profile and iterate the program design to improve performance much faster in X than in Y. Worse, if you have optimized a program in Y following algorithm A, it would be much harder to switch to optimizing following algorithm B. Because almost always "optimization" makes program logic more complex and removes clarity. Some time so much so that you may not even try an alternative! Such increased complexity can also happen in an HLL program but a modular program can often be rewritten more easily.
A trivial example: C uses strings that are zero terminated, but which we do not know the length, so it is necessary to count the string length when we need it.
If you implement, say, a Pascal compiler in C, it can be faster than C because Pascal keeps the length of the string with the string.
A language that does not permit pointer arithmetic can optimize the memory layout on the fly, based on real world program performance and that may be faster than a language it is written in.
My argument has always been and still is...a language x written in language y cannot by definition be faster than language y.
I understand that languages by themselves aren't fast, but it seems that point should be clear.