I almost skipped this link; I assumed it was typical borring blog noise. It's not.
This is an insightful post from the git mailing list which shows some of the real limitations that a top tier developer hits when trying to write Java code as fast as neatly optimized C code. Definitely worth reading.
Yep. The usual "Program X is faster in C than Java" gets a barrage of "That's because you know C better". Shawn is a performance-obsessed Java expert, Eclipse committer and longtime Google coder who works on JGit. If he says Java is slower than C at this, then Java is slower than C at this.
EDIT: but as wcoenen points out, this was written in 2009 and Java 1.7 does a better job with some of this.
> "If he says Java is slower than C at this, then Java is slower than C at this."
Yes. I like how you qualify the statement. Furthermore, on the C side you have Linus and other C gurus that really, really know how to exploit the strengths of the C language.
I would also expect them to know how it works _all_ the way down through the OS, which can also have an impact on performance. IE, they know how the supporting systems operate thus can make further assumptions/optimisations.
He's an expert you say? That was certainly not my expectation from the article.
(1) Blind faith in Generics.
This alone screams newb to me. He says that he got better performance with a custom data structure (no shit sherlock) but then seems deeply surprised by this. Duh. Okay, well, obviously he's relatively new to Java, but hey, he could still be a performance expert.
(2) Never mentions the biggest weapon in the C vs Java arsenal. The thing is that C is only able to make optimisations up to a certain point, but there are optimisations that can only be made at run time not at compile time. So Java starts off behind, but can catch up some or even all of that distance.
People have used this to demonstrate Java code running faster than C code, but that is old news, a newb might not know this.
(3) Does not mention the second biggest weapon in Java's arsenal in the Java vs C speed argument. That being that more recent versions of garbage collection allow for super fast memory allocation - enormously much faster than what you get with malloc.
(4) Never quantifies how much slower Java is. If Java is 5 or 10% slower, then Meh, is that really news? If Java is 2x slower than carefully hand-tuned C by guys with actual code writing credentials like Linus, then that is still pretty good. 2x screamingly fast is still more than good enough for most people. If the difference is an order of magnitude, then that is not so good. If the difference is two orders of magnitude, then you might as well be using some scripting language.
A proper expert would certainly have quantified the speed difference.
I believe he is simply just counting his experience while porting GIT to Java. No where did I find that he claims he's an expert. What he did was providing his experience on the trenches.
In case if you are the expert, I would assume that he wouldn't mind if you have any insights to tune in the performance of JGit even further. You can start by joining in the mailing list.
He wasn't quantifying anything since he was not answering your question, he was sharing his experience.
It's truly liberating to get back to C after having programmed in something higher level for a long time. It actually makes me appreciate C more. I mean, at first, I shoot myself in the foot, elbow and groin with alarming regularity, but it's nice to actually have access to the bullets. :) [EDIT: dodgy grammar]
I wonder why none of these benchmarks measure the performance of _optimized_ Java? All of our builds use Proguard optimization and some of the resulting bytecode is noticeably faster.
All the points are valid but they are peculiar to Java, not to all managed high-level languages. C#/.NET, for example, have unsigned types, value-type arrays and structs, memory mapped files and specialized collections.
As an example, the C# port of Sqlite is sometimes faster than the C version on queries, although updates are slower, despite Sqlite is a highly optimized C library.
It's also worth pointing out that the C# port of SQlite omits using certain C mechanisms (like pointers) in favor of passing copies of byte arrays around. You'd expect this to make it slower, but in many cases, it doesn't! (C# supports pointers, but the port doesn't use them so that it'll work in limited environments like Silverlight)
Money quote about half-way down the page: "SQL statements compile into virtual machine code"
Every SQL database engine compiles each SQL statement into some kind of internal data structure which is then used to carry out the work of the statement. But in most SQL engines that internal data structure is a complex web of interlinked structures and objects. In SQLite, the compiled form of statements is a short program in a machine-language like representation. Users of the database can view this virtual machine language by prepending the EXPLAIN keyword to a query.
The use of a virtual machine in SQLite has been a great benefit to the library's development. The virtual machine provides a crisp, well-defined junction between the front-end of SQLite (the part that parses SQL statements and generates virtual machine code) and the back-end (the part that executes the virtual machine code and computes a result.) The virtual machine allows the developers to see clearly and in an easily readable form what SQLite is trying to do with each statement it compiles, which is a tremendous help in debugging. Depending on how it is compiled, SQLite also has the capability of tracing the execution of the virtual machine - printing each virtual machine instruction and its result as it executes.
Off-topic but relevant: I've been looking for info on just how database software does what it does. Ie from parsing the SQL query to hitting the disk, and everything in between. Anyone got any links or books?
Gray and Reuter's Transaction Processing is absolutely superb, and it covers a good part of this stack, everything lower-level than query planning. However, it covers every approach to doing this. If you want to know how Postgres, say, does it, that's a lot less information, and may be more digestible.
Thanks. I am interested in the general theory of how every database does it; it bothers me that this area of software engineering is terra incognita to me.
The sqlite source is well put together and pretty small. The (now pretty much obsolete) sqlite 2.x source was a lot smaller and might be a better starting point for pure learning purposes.
Bonus: in the sqlite command line tool, putting "EXPLAIN " before a query will dump out the VM commands that the query was converted to, without executing them.
Any decent textbook on databases would cover most of this, with the possible exception of parsing SQL. It will likely have an in-depth discussion of data structures to reduce disk access.
I strongly recommend you read one such book more or less end-to-end.
HN: As with other areas of computer science, there is often a "canonical" book, like TAOCP or CLRS on algorithms. Is there any such book for databases?
Things have changed a bit since then, but not much. SQLite's API is very different than regular databases because it is a library operating in the same process. In particular it does not calculate all result rows for a query up front (that wouldn't be very 'Lite') but instead calculates the next matching row as you ask for it.
Consequently the internals have to be able to record their state, return a row, and then resume from that state to get the next matching row. There is a also a fair amount of query optimisation that goes on, which again means the need for expressing queries in a variety of different building blocks. Combine the state machine with building blocks and you have a special purpose VM.
In particular it does not calculate all result rows for a query up front (that wouldn't be very 'Lite') but instead calculates the next matching row as you ask for it.
If you squint really hard you could make that case, but in SQLite they are really not the same. You cannot use SQL syntax and you can't do other operations on cursors other than read the columns for the matching row.
Virtually all other database engines calculate the query results up front. It is more effective in their implementations to do it that way. For example they will also calls to ask how many rows remain in the results. SQLite has no such API and the only way to find out is to actually retrieve each result row.
All the points are valid but they are peculiar to Java, not to all managed high-level languages. C#/.NET, for example, have unsigned types, value-type arrays and structs, memory mapped files and specialized collections.
I heard about related stuff happening while I was working at a Smalltalk vendor. The programmer who was implementing the network cryptography library would just call up the VM engineer and ask for goodies like support for large bit arrays, and he'd get them in for the next VM release. The programmer was able to beat some of RSA data security's (poorly implemented) reference DLL's written in C by 3% with a Smalltalk program.
In the Smalltalk environments, it's easy to implement "primitives" coded in C, just in case you weren't internal to the vendor. With open VMs like Squeak, you can add your own bytecode to support optimizations if you want to.
Slightly offtopic, but I wonder how much overhead in those benchmarks comes from calling native code from .NET runtime? An interesting data point could be benchmarking equivalent implementation in C or C++, avoiding the overhead of native-managed transition.
In my experience, interfacing native code via C++/CLI has negligible overhead, unless arguments are big and complex data types, which have to be converted to .NET types.
Otherwise, unsafe regions have practically zero overhead, and they are close enough to the metal.
This is an old email... there's been many improvements to both JGit, Java/JVM and other areas of interest.
Shawn and I gave a presentation at the Googleplex not so long ago about JGit [1]. In particular, you may be interested in the 'JGit at Google' section.
There are some cases where JGit is faster than CGit, but the benefits of JGit are that it's easy to embed. There are projects like gitblit and other IDEs that use the library. On top of that, you have crazy folks like NGit [2] who cross compile the library using Sharpen so it can be used by the .NET community...
Total speculation... but maybe because C git clone always reads from local disk (?). jgit clone appears to read from Bigtable/GFS, and those systems have in-memory caches, or columns can reside totally in memory. Also you could probably make use of parallelism in I/O with cluster of servers, where as with local disk you are probably limited by there being a single disk head that has to move around.
So I doubt it has anything to with Java, but the underlying storage. If I'm wrong I'd also like to hear about it!
Many of us have already read this and it's been submitted to HN several times before. This, of course, does not mean that it's not worth reposting, but interested parties may want to dig up some of the past discussions.
I find it kind of interesting that in Haskell, which is arguably even higher level than Java, most of these optimisations are eminently possible..
EDIT: This obviously came across a bit as language fanboyism, so I guess I should mention that the language features that let you do many of them let you shoot yourself in the foot just as easily as you can in C, and you can certainly argue that with a strong FFI you might as well just call into C if you really need that kind of low level performance..
It seems that for almost any popular piece of C or C++ software there are people who are motivated to produce a pure Java implementation. For whatever reason, you don't see that motivation in other language communities. Outside of Java-land, most feature-for-feature copies of existing software seem to be undertaken for the sake of learning or linguistic patriotism, which are not sufficient drivers to sustain such a project to completion.
My guess is that this phenomenon reflects the fact that other language communities have greater comfort and facility with C libraries, or to look at it another way, the fact that complete independence from native libraries is actually a feasible goal for most Java projects.
No one uses Haskell, or no one uses those optimisations?
Haskell has plenty of industrial users and quite a few very large programs as well. Those libraries hardly look mature - I know that many of the container libraries on Haskell make extensive use of unpacking, for instance.
I've heard the argument before that in need, one can use a FFI to optimize bottlenecks in high-level code, but I've never understood.
Won't using a high-level language incur an omnipresent speed slump? And even if a bottleneck exists, how would using a FFI remedy crucial problems in the language, like the absence of unsigned types or that all types are boxed. The types will have to be unboxed anyway, so whether that happens in foreign code or in the interpreter/JIT code won't matter.
At least in Haskell you have unboxed primitive types, memory mapped IO, bump-pointer allocation, and compilation to direct loops that are often identical to what GCC produces (or very close).
> Won't using a high-level language incur an omnipresent speed slump?
Yes, but most programs don't require high performance everywhere - in a library like JGit for instance, most operations are probably plenty fast written in Java even for very large projects; it's likely only a few are problematic.
> And even if a bottleneck exists, how would using a FFI remedy crucial problems in the language, like the absence of unsigned types or that all types are boxed.
That's maybe an argument to allow more control over memory layout and machine representation in high level languages - although there are ways around this, like defining your data types as a C++ class and then providing a high level binding.
> Won't using a high-level language incur an omnipresent speed slump?
Not sure that is true. Just look at pypy(http://pypy.org/) which claims that run-time optimizations in the interpreted interpreter outperforms the C interpreter, and quite significantly in many cases. So I don't think it's true that high-level languages are always slower. It has a lot to do with the optimizations you can do at run-time. There is also an interesting paper on developing an OS based on run-time code synthesis for optimizing performance (http://valerieaurora.org/synthesis/SynthesisOS/). The major drawback of languages like C is that it can only optimize things at compile-time. I think as projects get larger and we move towards parallel structures and algorithms the need for languages that support run-time optimizations will be greeter.
You're right that the FFI can create significant friction, but once you're in C-land, you get C-level performance. So you need to move whole algorithms into C. In a O(n²) algorithm, the O(n) FFI friction will be negligible for a large enough value of n.
like the absence of unsigned types or that all types are boxed
It isn't always that straightforward. With Java, if you move your code into C you may also need to keep all of your data in C-land to avoid the overhead of copying it back and forth. Then the data is harder to access from Java, plus you can't rely on garbage collection to free that memory when you're done with it.
I build fairly high-performance Java code. And get hit with three major gotchas which prevent it from approaching C code.
- There's no way to do array access without null pointer and index checks each and every time.
- Generics with basic types, and their unfortunate embedding into syntax (like the new for() syntax), are awful. Boxing and unboxing incur a ludicrously high penalty, and generics push coders away from using arrays. Unlike in C++, generics have been the enemy of performance.
- Poor quality collections classes (ArrayList and HashMap are notoriously bad)
Sure there's a few other things like pointer walking etc. in C, and Java's poor floating point, but the big three above are the killers.
> - There's no way to do array access without null pointer and index checks each and every time.
Actually there is. Have you checked out the sun.misc.Unsafe class? Lots of very dangerous gems in there, among them the ability to calculate array offsets and access the array elements directly. (check out arrayBaseOffset + arrayIndexScale + getObject/getLong/etc)
I would also add inability to create objects in stack, if you are doing anything recursive. The overhead of heap object creation is pretty visible. So I had to either reuse objects and essentially create my own memory management layer or try to stick data into primitive types which obfuscated code logic quite a bit.
Java 7 uses escape analysis to do stack allocation by default. So, if you play along and write code for which the escape analyzer can activate stack allocation (I don't know what the rules are for this), you can get those benefits.
I'd just like to point out that your complaints are specific to the Oracle JVM (except for the new for() syntax obviously).
Even the collection classes can change between JVM versions. I'm not saying that they're all necessarily better, just that different versions are different. So the Dalvik or IBM J9 versions might do what you want better.
I had a similar experience when I was doing some Galois Field arithmetic in Java. You pay a huge penalty because of the absence of unsigned types. In our case we had to use long instead of int, which is extra costly, since many basic operations in Java return int by default.
I was doing it in GF(2^32-5). Your statement is true for GF(2^n) where n is small enough to keep the entire multiplication-table in memory (usually n <= 8). When it's bigger you keep log-tables in memory then sign matters. However when n=16 you get lucky and can use char as an unsigned 16 bit int.
More specifically, the problem of trying to write a binary compatible java implementation of a neatly optimized solution written in C. So, the program in question is executed, reads a whole bunch of binary data from a whole bunch of different files, does some calculations on that data and then exits.
The questions are, if you had to develop a distributed version control system in java: a) would you solve it the same way, b) would your solution be faster or slower, and c) would it take more or less time to write it and be easier to maintain?
Clearly you would not solve it the same way, for example it might stick around in memory as you worked. Could it then appear faster, from a user's perspective? Possibly. Might it be easier to maintain. Also possible.
Pretty much by definition, if you are writing it in C, a binary compatible solution is not going to run as fast if you port it to java.
I don't think that is a conclusion that has much value.
Because cgit is a bunch of binaries that expect to call each other. That makes it harder to abstract out the storage layer, and we don't use vanilla repositories sitting on a filesystem. Things are backed by some other storage abstraction, which isn't always very posix-filesystem like.
There was a google talk on this posted to HN recently, but I can't find it. In it, one of the directors of the build / testing / code review system at google was talking about how they get things working at scale. Since everyone works out of the HEAD of one Perforce repo, they end up using the map-reduce infrastructure to perform tests in the cloud for each checkout. In line with this, there are too many files, that update too often for every developer to be checking out of the repo, so they use a custom FUSE filesystem to lazily give access to files only when they're needed.
The original goal of JGit/EGit was to provide an Eclipse plugin for working with software using the Git SCM. The Eclipse plugin is still the main goal of many of the developers, but we are open to anyone wanting to interface with other tools, Netbeans, Ant, Maven etc. For those, the JGit part provides a high performance API for working with Git repositories. The main other user of JGit, besides EGit, is Gerrit Code Review, which used by projects such as JGit (ofcourse), EGit (by implication) and Android.
Not sure if that provides a very good motivation, but there you go. :)
"But, JGit performs reasonably well; well enough that
we use internally at Google as a git server."
He is not advocating to never use Java. He is just pointing out some things that may or may not be important when choosing a language to write a program.
It's a java library, making it much easier to use from java code than a command-line utility and all the ensuing parsing and munging of string data (which is not java's forte either)
There is a few things that require jgit: a while back, I found a really nice library (I'd look it up, but am on 3G on a train) that let you use Amazon S3 as a git remote. The author had just written some bridge code between jgit and an AWS library. Works suprisingly well, you just have to remember to use `jgit push` rather than `git push`.
A lot of this is poor API design, and the product of Java's baggage as something that needs to have well-defined safety semantics for internet applications. It is not a necessary constraint of high-level languages that they don't offer the ability to get down to the metal. SBCL, for example, offers a lot of mechanisms for unboxed primitive arrays, unsafe declarations, and these days even SSE intrinsics.
I've had the issue using maps with primitive keys. I solved it by isolating the performance critical functionality and not using the Collections framework there, instead writing my own data structure for it (with heavy influence from the hashmap one).
This tends to be my general philosophy, by the way. Reuse code to get something working fast, isolate what really causes bad performances, then solve only those problems by going under the hood. If the performance issues remain, cheat by pretending it doesn't exist, by making sure we're never in a worst case scenario and handling the worst case scenario differently.
In my "IntHashMap" case, the worst case scenario was gathering the keySet. I made sure that I'd only call it when I really really needed it. The rest was "fast enough" once I had removed the underlying Integer Object on the key.
Well, with a MappedByteBuffer (or any DirectByteBuffer), if you want to manipulate the data as a Java type (e.g. byte[]) you have to copy the data into the heap. byte[] cannot exist outside of the heap.
Still, I wonder why they're using a MappedByteBuffer in the first place if they're working with the data in the Java heap.
>So. Yes, its practical to build Git in a higher level language, but you just can't get the same performance, or tight memory utilization, that C Git gets. That's what that higher level language abstraction costs you. But, JGit performs reasonably well; well enough that we use internally at Google as a git server.
I think that this is the key takeaway for the entire post.
One of the reasons I generally dislike any of the "X IS BETTER THAN Y" bakeoffs is that performance is now so implementation dependent that these comparisons are pretty much moot. Given that basically any non-trivial implementation can be improved, it's difficult to say that anything is faster, especially when one considers developer skill.
Developers should not be chasing the abstract, absolute best performance. Instead, the language used should be the one that delivers performance that is good enough for their client's needs. If they can get it with something that we're familiar with, that's great. If they need to learn a new tool, that's also good. But it doesn't make much sense to throw away all the knowledge that a developer has about a certain language to chase "better performance" with a different one. Most likely, the first effort implementations on a new language won't be nearly as good as the implementations on the more familiar language.
It's generally true that optimized Java won't ever be as fast as optimized C. But for the vast majority of cases, it doesn't need to. Java's speed is enough for those cases. And in the small minority where it's not sufficient, C is still around.
I'm sure there would be some speedup, the question is whether it would be worth it (and I suppose that can only be adequately be assessed by the developers, who now have to maintain C code instead of Python).
But for some perspective from a former Mercurial developer: lots of the more performance-sensitive code has already been rewritten in C. Rewriting the rest of it would simply be a question of diminishing returns. One thing that would improve is hg's startup time; starting up Python just takes a while, which kind of sucks for command-line programs like VCS clients that tend to have many short-running invocations.
> One thing that would improve is hg's startup time; starting up Python just takes a while
Python starts up very fast for a language runtime (much much faster than Java). But yes, if you run a large amount of extremely short tasks, the startup might become significant I guess.
I remember a post on here recently which said that sometimes a high-level language can be faster than C, because you can convey more of your algorithmic intent and thus the compiler can optimize better for you.
It gave an example where the compiler's knowledge that something is an immutable array means better optimization. Which you can't express in C.
In cases where performance actually matters, just avoid bit-twiddle in high-level languages. It sucks too much. You'll probably waste less time on optimizations by offloading the biggest bottlenecks to C/C++ with the native/extension interfaces in your high-level language of choice.
Be careful and stay standards-compliant and you can keep most of the portability and maintenance advantages while picking up some significant speed.
>You'll probably waste less time on optimizations by offloading the biggest bottlenecks to C/C++ with the native/extension interfaces in your high-level language of choice.
In reality this often requires a heavy refactor to actually work. In Java with JNI for instance the overhead of calling native methods is actually rather high, over 200 cpu cycles in many cases. The stack often has to be re-arranged, a CPU stall is usually caused and in the case of most data types passed to the native function, they have to be copied (last i knew java.nio buffers were the only types that weren't copied).
Point is, just moving your "hot function" to C / C++ and calling with JNI doesn't work unless that function is rarely called and does a lot of work internally. More often the "hot function" is something that is called thousands of times and moving something like that to JNI is just as likely to kill performance as help it. You'd have to abstract away an entire module of work and minimize its call surface to JNI to achieve your goal.
This is an insightful post from the git mailing list which shows some of the real limitations that a top tier developer hits when trying to write Java code as fast as neatly optimized C code. Definitely worth reading.