The counter argument here is something like Lucene, which is written in Java but has been heavily optimized over the years using many of the same kinds of tricks the Git people used to optimize Git. There are frequent attempts to replicate parts of what Lucene does in different languages; usually under the assumption that it will be faster and better because of magical properties people associate with things like native compilation or the supposed manual memory management skills of the programmers doing the work.
Most of these efforts are relatively niche compared to Lucene because they never quite catch up in terms of features, scale, etc. and because the programmers involved are messing around on the fringes of the problem space instead of coming up with algorithmic breakthroughs, which at this point is the quickest way to make things faster. Well, that and cutting some major corners and pretending it's all the same by running some silly benchmark.
The fallacy here is confusing language, idioms, memory models, frameworks, etc. and assuming it's all set in stone. It isn't. Just because you are using Java does not mean everything has to be garbage collected, for example. Lucene actually uses memory mapped files, byte buffers, etc. for a lot of things. So, it does not actually need to do a lot of garbage collecting. It uses the same kinds of solutions you'd pick when using C. And they perform ballpark as you'd expect them too meaning that unless you improve the algorithms, you are not going to be magically a lot faster. The same is true of hotspot, the JVM's runtime compiler, which is written in C and, surprise, uses a lot of the same kind of trickery used by the fine people working on e.g. LLVM. So, a lot of things people assume must surely be slower just aren't and a lot of the insurmountable bottlenecks that people assume must surely be there always actually have well known ways of being worked around.
Of course there is always room for more optimization. Lucene is well over two decades old now and they still regularly come up with major performance improvements. There's nothing magical about how they do that; just a lot of hard work that goes into it that would be somewhat challenging to improve on just by switching language and compilers.
That's just because there's no a lucene equivalent C library with the same level of attention?
however, there are increasingly such written in C++ (pisa) and rust (tantivy). They handily beat lucene in benchmark suites [1] - so it seems like lucene does suffer from a java penalty - despite getting even more developer attention than pisa and tantivy I would think.
No, because a lot of the work would end up being the same kind of work with no inherent advantages of one over the other. Easy to predict because there used to be a C implementation of Lucene. It couldn't keep up in terms of features or performance so work on that stopped a long time ago.
Most of the libraries you mention don't come close to even implementing a tiny portion of Lucene. Classic case of apples and oranges. Also, good examples of the niche things I was talking about. Benchmarks like you mention are kind of self serving like that. They measure something but not everything and probably very selectively. They are faster at what exactly? Under what circumstances? Why? As soon as you answer those questions, what inherent limitations do the Lucene developers have replicating that?
A lot here boils down to how the underlying search engine implements tokenization, stemming, language analyzers, fuzzy matching, and a few other things that you'd need to build a search engine that doesn't suck. The benchmark conveniently does not specify any of that; presumably because it lacks many of those features or has extremely naive implementations of those things. That would be the kind of corner cutting I was talking about. What kind of relevance ranking is being used here? How good is it? Was that even evaluated or considered? Hint, Lucene gives you many options here. Search quality and performance are the big trade off here. Anything that trades off quality over performance is going to look good until you look at the quality.
Also things like the index size and document volume are not specified. Or the hardware this ran on. Or the JVM configuration, compiler flags, etc. It's like benchmarking a formula 1 car by how quick it is at parallel parking. Yes, a T-Ford is going to be faster at that maybe. But is that even meaningful to look at?
Sure - there's no doubt that lucene is a much bigger and more "enterprise" solution that is used professionally today by many big companies from Elasticsearch to Mongo to Solr.
But for core text search and indexing - tantivy does indeed support tokenization, stemming, fuzzy matching, and so on. For the core use case of performing a text search on a large corpus of text a lot of the functionality one would need is there - as you can see if you look at the list of queries in the benchmark. From the list of queries there's little I would miss from our "enterprise" use of lucene today - speaking for myself.
One might argue "oh but if you add some obscure features that 5% of people use like a real enterprise solution then it will slow down by 2x" - but I doubt it.
To me, tantivy and the likes are a technology demonstrator - they show that lucene could be significantly faster if it wasn't in java.
The whole point of a search engine is getting you good results. Not alright result, or some random results in the wrong order, but the best possible results. Bench marking relevance is much more important than bench marking performance for search.
Lucene is a Swiss army knife for building you something to get you the best results possible. Those aren't enterprise features if you actually care about what your users do with search results. For example because your sales are directly correlated to that. The trick with Lucene is to do as much as you can possibly get away with as opposed to doing as little as you can do to call it a day that seems to be the strategy here for proving X is faster than Y with this particular benchmark. It's faster because it does less useful work.
The proof of concept here would be building something that is as good and as fast. By not even trying to benchmark how good things are, you kind of make the point that you either don't know or don't care enough to even bother to measure it.
Why did you offer lucene as a counterpoint, if you deny that there’s any equivalent implementation elsewhere that isn’t lucene? That just means lucene cannot be a counterpoint if it just so happens that the only implementation is in java.
Because people have been trying pretty hard to replicate what Lucene does in other languages specifically because they thought they could do a better job. The reason the Java implementation continues to dominate is that it being implemented in Java has repeatedly proven to be not as much of an issue as people assume it to be. At least not enough to matter.
And since the article is called "why C is faster than Java" and the main argument is literally "here's this thing implemented in C that is fast", it's an excellent counterpoint to go "here's this other thing in Java that is fast".
Sure, I would agree that lucene is “fast enough”. But that doesn’t mean it couldn’t be faster if implemented in another language, and I believe that tantivy demonstrates that it could be.
This however, throws out all productivity, that a language like Java gives developers in contrast to C, plus memory management. The C equivalent most likely would be a buggy version with the typical memory management issues, or consume much more time to create.
I am not a big Java fan. We are not talking a Haskell or Rust or similarly safe language here. However, Java is still far ahead of C in that regard.
From what I’ve seen working in the financial industry, hardly anybody uses lucene directly. It is typically behind elasticsearch or solr or whatever - which then takes care of load balancing and replication and all that.
For these use cases, you could swap out lucene for something faster or more “low level” in implementation without affecting the typical application using it much, as all interaction happens across some rest API.
The point I am making is, that you would have to write that lower level or somehow different thing first. If you do it in C, then how do you make it as safe, while still having the same functionality? In huge projects this has mostly not been achieved. Large C projects (and C++ project) suffer from memory safety issues. This seems to be the general rule. Even the best make mistakes, when working with big C code bases, or there are so few of those "best", that they cannot possibly create a huge project all by themselves.
If you replace a more or less safe implementation, with a buggy, memory unsafe C version of it, you are not going to make anyone happy about the performance improvement. Always assuming, that you put in "only" the same amount of time and number of people, not more, than the original implementation.
The amount of things you need to keep in mind with all the memory management and other details when writing C code can also make you blind to other issues like proper input validation and implementation of invariants of application settings. You have to put a lot of energy into making sure you are not leaking memory and avoiding memory safety issues, so you lack that energy when it comes to higher level issues. This is directly affecting developer productivity.
I mean, obviously using different languages has different trade offs in many dimensions. If speed were all that mattered then everything would be written in assembler.
But the point at issue is that by using java you are trading off speed (potentially for other benefits).
By the way - tantivy at least is implemented in rust so it shouldn’t suffer from memory safety issues just as lucene shouldn’t. And the rust language will provide some guarantees that java doesn’t - like freedom from data races.
> there's no a lucene equivalent C library with the same level of attention?
Actually, there is [1]. Clucene has been around for years, it is a re-implementation of Lucene in C++, and yet the only serious usage I've seen was it as a file indexing engine in KDE many years ago (although I might miss other projects). Later it was dropped from KDE.
I've had (indirect) relations with search/indexing stuff for years, and no one ever complained about Lucene's speed or had any problems with it.
That’s not a counter-argument, because most Java developers can’t write such C-like code and more importantly they don’t want to.
The debate was always about idiomatic Java vs. idiomatic C.
What’s amazing to me is that I’ve had such discussions 15 years ago, 10 years ago, etc. I would have hoped that the Java community would have accepted the reality by now :-)
Still, there’s also an interesting new development happening in this area: the members of the Rust community have picked up the mantle of native vs. managed performance. By now they’ve spent so much social capital on “rewrite it in Rust”, that it’s not clear if this will be a positive for native dev advocacy, but the show will definitely be fun to watch.
> it’s not clear if this will be a positive for native dev advocacy
I've rewritten a few things in rust. Seems pretty positive to me, because you can mix some of the best optimizations and data structures you'd write in C, with much better developer ergonomics.
A few years ago I wrote a rope library in C. This is a library for making very fast, arbitrary insert & delete operations in a large string. My C code was about as fast as I could make it at the time. But recently, I took a stab at porting it to Rust to see if I could improve things. Long story short, the rust version is another ~3x faster than the C version.
The competition absolutely isn't fair. In rust, I managed to add another optimization that doesn't exist in the C code. I could add it in C, but it would have been really awkward to weave in. Possible, but awkward in an already very complex bit of C. In rust it was much easier because of the language's ergonomics. In C I'm using lots of complex memory management and I don't want to add complexity in case I add memory corruption bugs. In rust, well, the optimization was entirely safe code.
And as for other languages - I challenge anyone to even approach this level of performance in a non-native language. I'm processing ~30M text edit operations per second. A few years ago I tried something similar in JS and got about 100k edits per second. (300x slower)
But these sort of performance results probably won't scale for a broader group of programmers. I've seen rust code run slower than equivalent javascript code because the programmers, used to having a GC, just Box<>'ed everything. And all the heap allocations killed performance. If you naively port python line-by-line to rust, you can't expect to magically get 100x the performance.
Its like, if you give a top of the line Porsche to an expert driver, they can absolutely drive faster. But I'm not an expert driver, so I'll probably crash the darn thing. I'd take a simple toyota or something any day. I feel like rust is the porsche, and python is the toyota.
Re-implementations often turn out faster simply because you understand the problem better, which seems to be the case here.
The same algorithm not going to run faster in Rust than tuned C, period. Just like C is never going to run faster than tuned Assembly. I wish we could move on to discussing something that matters.
> The same algorithm not going to run faster in Rust than tuned C, period.
Weirdly enough, before I added that optimization my rust code was ~20% faster than the C code anyway. And I have no idea why - the programs were (as far as I could tell) identical. I was using the same compiler backend (llvm) and before alias analysis was turned on for rust. And -march=native in both cases.
Could just be a weird coincidence of the compiler’s inlining decisions - though I suspect not. I tried investigating it but I don’t understand x86_64 assembly enough to understand how the binaries differ.
But nor does most code require that sort of control over memory.
Eg. would a program with very diverse code paths improve that much from fitting data into L1 cache better? I am genuinely asking it. Because of course for things like pipewire handling audio streams it is very important and Java would not be a good fit. But for a web application I really don’t see C beating Java by much if by any (and the techempower benchmark shows native languages at top only when those are heavily optimized very specifically eg. depending on the exact format of HTTP requests)
Applying the same argument to C, C isn't a system programming language, because the use cases where it is used as such rely on compiler extensions not required for ISO C compliance.
You also see the same thing with fintech areas like ringbuffer-based order matching engines. Sure, with an infinite amount of time you could make a C implementation run a little bit faster, but practically speaking you would have a hard time building something that gives the business the same level of confidence as a Java/C# solution. Especially, if there's piles of manual memory management and ASM hacks going on.
In my experience, C#/.NET is in a really interesting position when it comes to these questions. It does support value types out of the box which can add another order of magnitude over Java implementations of the same code. One example of this is the LMAX Disruptor. This was originally written for Java and then ported to C#. Because C# supports value types, a special ValueDisruptor variant was developed specific to its port, which enables performance that would otherwise be impossible in the Java solution (which is constrained to using reference types in the buffer).
You definitely don't need an infinite amount of time to make C or C++ implementation of a matching engine run faster than a Java based one.
I've worked at two fintech companies, a prop shop and an investment bank and seen some APIs for other companies' matching engines and they've pretty much all been C/C++.
I don't see how this is a counterargument, or highlighting any fallacy.
The article talks about hacks used to get around garbage collection in Java and increase performance, including memory mapped files and byte buffers. It also talks about the pain points of using these hacks and the costs they have to pay despite these hacks.
It also concludes that it is practical to build Git in a higher level language and JGit performs reasonably well, despite not getting quite C level performance or its memory utilization.
Where does one go to learn all these optimization strategies and tactics? I really want to take a more deliberate approach to optimizing my (Go) code, but I’m currently stuck banging on things with a wrench between benchmarks.
Frustratingly, most advice I get is non-actionable, and amounts to restating general principles like “be aware of cache”. Are there any books or online resources that can help me build a more principled approach to runtime performance?
The general process I’ve seen is to profile the application and look at what’s taking the most time. Then you sit down and think about how you could make the slowest parts faster. There isn’t a one-size-fits-all answer to your question. Benchmarks are good for validating your improvements, but not for finding what to improve. That’s what profiling is for.
Yeah; this is what I meant by “banging on things with a wrench between benchmarks”. I can’t fight the feeling that there is some background knowledge that makes the search more directed.
There is some background knowledge, like knowing time and space complexity of algorithms and datastructures that you use and how this abstractions interact with underlying hardware on which you are running your application. Knowing how CPU, CPU caches, OS memory management, as you mentioned that you work with Go so Go internals (like goroutines scheduling, how memory is allocated by Go runtime, how GC works etc), kernel task scheduling (context switching etc) work and how they impact your workload. A lot of it is intuition that you get with working with code that needs optimising throughout the years and a LOT of reading but sometimes you can flush that intuition into the toilet because you often find bottlenecks in places you wouldn't think about. So it really depends on the nature of your application (is it a network application or HPC, multi threaded? single threaded? distributed? you are optimising for high throughput or low latency, do you control hardware, network etc)
Interestingly, I've found that studying other engineering disciplines outside of software engineering to be best. The best explanations and modeling frameworks for concurrency I learned were from network engineering books and a couple hardware design classes I took.
Re: principles around performance analysis and diagnostics, Brendan Gregg is a name you should look up. His book on systems performance is a tremendous resource in both principles and methodology. Even though it's not focused on coding specifically, the same principles apply.
> restating general principles like “be aware of cache”
It's a very powerful concept, and there really isn't all that much beyond it. That's the problem with pretty much all of the deep truths about computing, they're frankly pretty simple once you grok them.
Well, It is not Lucene alone, there are many tools in java written around Lucene such as ElasticSearch/Solr/Elassandra and so on, so no language or runtime can replace everything. Eventually something might catch-up. That is why it is not language alone! It is developer, runtime, ecosystem and problem domain, brainshare.
Lucene is the typical example of software that works well despite not being as fast as possible, because its features are more important than raw speed that C could provide. Nothing wrong about that. After all, that's why we buy more hardware. However, it doesn't disprove the point of the article. It could be possible to build something at much better speed than Lucene if there was the interest and man power to do that in C.
One need to see the amount of resources to run Lucene or its enterprise solution like Elasticsearch. So Java based solution work because enterprise can through lot of resources at it. But it does not mean they are efficient or faster. Else Java folks at Oracle wouldn't be spending decade long effort on flatter memory layout of Java objects.
How do they cast raw memory pointers into Java types? In C, you just cast a void pointer to a pointer to intended type, and voila - that piece of memory is interpreted as if an object of that particular type resided there. Is anything of this sort possible in Java?
It is quite interesting that most of the problems mentioned don't exist in recent version of C# on .NET Core, considering all the similarities of C# and Java.
I would even say, some of the problems didn't exist in C# in 2009. C# always had value types with configurable in memory layout. It also has a very good mmap solution. It also allows for hand optimize things using unsafe blocks.
> C# always had value types with configurable in memory layout. It also has a very good mmap solution. It also allows for hand optimize things using unsafe blocks.
And C has inline assembly. Doesn't mean that most C code will use inline assembly.
Back in 2009, a lot of git utilities were still written in scripting languages. Not sure when it started, but the porting activity of those utilities to C is still ongoing. So the maintainers still want to use a lower level language today.
In other projects in the VCS space, we are seeing a similar trend. Hg, originally a project written in Python, is being rewritten in Rust by Facebook, one of the big users of it.
Sure, maybe you could have used C# together with some niche features. But it's not going to be fun compared to a language that has zero cost abstractions and that runs on the bare metal.
Even if your problem domain demands a managed environment, like extensibility with plugins, I still suggest you to use Rust together with wasm. It's the first choice thanks to its great type system, powerful static analyzer and first class support for resource management that garbage collected languages lack.
I think in this scenario it's totally germane to mention Rust because the problem described in the linked post is exactly the problem that Rust was designed to solve: providing sufficiently precise control over low-level runtime behavior that you never hit a "sorry, it's not possible to do that optimization in this language" situation, while still (arguably? hopefully?) qualifying as a "higher-level language" in the relevant sense. In particular, every problem with Java that the post describes has a straightforward solution in Rust, and this kind of thing is why Rust exists instead of, e.g., Mozilla just rewriting Firefox in an existing managed language with a garbage collector.
That being said, GP seems to imply that Rust should be the default choice for basically every problem, which goes way too far. Not every application needs this kind of low-level control. Maybe even most don't (although I look forward to a future where it's easy to drop into Rust from a managed language when you hit a performance wall; I think this has been mostly achieved for Python, but not yet for other languages). But some do, and it sure sounds like Git's one of them.
Rust is a low level language no matter how productive it may be.
The memory layout will simply leak into the program architecture and will have to be altered on refactors — something which is transparent with managed languages.
What do you mean here by memory layout? For instance, the order of fields in a rust struct can (theoretically) change by recompiling. It's not defined by the order of fields in the definition.
On a language level, high level APIs will necessarily contain details to things like (mut) reference, Box whatever. Which is not a problem at all, given the problem domain, but in my opinion it is not possible to make a both low and high level language at the same time (and it is not really needed either)
Git is the subject of the linked e-mail. Mercurial is the big contender to git that is not written in a C language. Their response to Hg's performance issues was not to use or create some Python feature that allows them to speed up some fast paths, but to use a proper low level language in the first place, which happens to be Rust. I'm not sure you can get more relevant to the discussion than this.
The trend seems to go away from high level languages in the VCS space. Developer time is one of the most expensive resources that FANG pays for, so any kind investment in performance improvements is going to pay off quite well.
Is there a term for this phenomena yet?
"if another language is being discussed, Rust must be forced into the discussion, no matter how tenuous the connection"
This always happens with whatever language is in vogue at the time. Now it’s Rust. It used to be Go (which still has a little juice left). Before that, Closure and Haskell both had runs. And before that… hell, I remember when Java was talked about this way.
This is the natural order of things and is good.
And the proper term for introducing Rust should be “oxidation”.
elixir, RoR and nodeJS, (and Python a couple of times) spring to mind. Some of those languages have found a niche. But lot of new languages made older languages nicer by adopting language/framework features
I’m not a…rustafarian?…but we didn’t get as cross when C# was mentioned above, in a thread about Java and C. In fact it’s top comment at my time of reading.
> Unfortunately, none of them ever seem to show up.
We do from time to time, but people assume our language is dead (it isn't). I learned it last year and I've been very impressed by how simple it is, given the speed you get with it.
It was a "big language" at the time, but now it's a language smaller than Rust or C++ which offers good performance with straightforward syntax. Ada also has a package manager now which includes toolchain install.
Ada has inline assembly, easy usage of compiler intrinsics, dead-simple binding to C, built-in multi-tasking (which includes CPU pinning), a good standard library, RAII, and real honest-to-goodness built-in, not-null-terminated strings. It's a compiled language, so you get good speed in general, but the built-in concurrency really does help work which can be split up. Ada 202x is getting even finer grained parallelism (parallel for-loops) in the language itself to even further help this.
And/or a lot of misconceptions. I showed up many times as well with those links, and explanations and whatnot.
I recommend https://blog.adacore.com/, too. Ada/SPARK is great when you want formal verification, and your checks to be done by GNATprove; statically, instead of dynamically. FWIW, you can disable runtime checks in Ada.
I've heard all sorts of things about ADA. my the main thing keeping me fron delving in has been the lack of general info about it. Thankyou for the links! I'll be taking a look through these. What kinds of projects are people building in ADA these days? I'm interested in it primarily for robotics.
I use Ada as my alternative to C, when I don't feel like doing C++.
I've written a few tools for myself, including a command line code discover tool for large code bases (tens of millions of lines). There's a bunch of embedded work being done with it.
Make sure you use "Ada" rather than "ADA". Some people might give you trouble about it--it's not an acronym, just a name :)
Ada is a bit verbose for my tastes. Nim [1] is fast like C - I have yet to really find anything rewritten in Nim be slower. It's safe-ish like Rust { there is an easily identifiable subset of unsafe constructs }. It's kind of like Ada, but with Lisp-like syntax macros/meta programming and Python-like block indentation (Lisp folks always said they "read by indentation" anyway). Nim also has user definable operators and many other features. Compile times are very small while the stdlib is big-ish.
Small sample statistics, but three or four times now I have re-written Rust in Nim and the Nim ran faster. Once you can do inline assembly/intrinsics in a PL, most "real world" benchmarks reduce to a measure of dev patience/time/energy not the language. They also become "multi-language" solutions (if you count SIMD asm as a language which I think one should). Even slow Python allows C/Cython modules which in the real world are absolutely fair game, and you can call SIMD intrinsics from Cython pretty easily, too. Since we have few ways to quantify dev patience/attention objectively, these "my PL is faster than yours" discussions are usually pretty pointless.
The old term for .NET/Java was "Managed" languages. "Managed C++", "C# is a managed language", because they all manage your memory for you.
Rust's primary language feature - the borrow checker - is about adding compile-time checks on resource management(mainly memory), and the original article talks about boxed vs. value types being a major source of inefficiency.
So talking about Rust in a comparison of C and Java mentioning memory indirection bottlenecks seems about the most relevant place to discuss it.
Most people talking about C# and Java, they refer mostly to application development. You rarely hear these languages at system programming (doable, just rare). Rust is at C/C++ level when it comes to system programming and eliminates a lot of C/C++ issues and yet added features found in Java and C#, and even Haskell. People just don't know a lot about Rust to criticize upon and yet seeing it mentioned everywhere. I can understand if some feel a bit "fed-up" seeing Rust brought up in a non-Rust thread. But I do agree with you, Rust is very relevant for discussion here.
It's actually the opposite. If anything being evangelical about Rust is heavily discouraged
The truth is Rust is an amazing language, with its own warts (async, Pin, etc.), but there is pent up demand for language that fits its description. Non-manual, non-GC low level oriented language. It's not a wonder some projects are switching to Rust
> Can hardly blame people for talking about modern languages in a discussion about obsolete ones.
The point is that the issue does not involve people discussing "modern languages", just mindlessly shoehorning references to Rust into any discussion involving any application of a language which is not Rust.
I get Rust fanboys are excited about their hobby, but this sort of obsessive "when the only tool you have is a hammer" discussion is very tiring and fruitless, and only conveys a poor image of Rust's community.
So, let me get this straight: We have a thread about a programming language (Java), then it gets compared to another programming language (C#), then it gets compared to a third one (C) and no one bats an eye. But when Rust is mentioned it's because of "fanboys". Yeah, sure.
> So, let me get this straight: We have a thread about a programming language (Java) (...)
No, you really don't. If you read the thread you're commenting on, you'll notice it's about C#.
The very first comment of the thread you're discussing in, and also the top post of this discussion, is, and I quote:
> It is quite interesting that most of the problems mentioned don't exist in recent version of C# on .NET Core, considering all the similarities of C# and Java. (...)
And somehow Rust fanboys parachute into the discussion to yet again talk about their hammer handling all nails and nail-like problems.
The thread I'm seeing is a top-level comment about C#, a reply that is on-topic and mentions Rust, and also assembly, Python, Hg, "scripting languages", and wasm.
Rust is exactly as relevant here as any of those other items, but people are getting really upset about the Rust mention.
I think in a discussion that already started by comparing different performance characteristics in different languages in a VCS, it's not at all out of line to bring up the fact that another VCS is being rewritten into any particular language. It seems to me that the anti-Rust sentiment is far more disruptive and off-topic here than the mention of Rust in the first place was.
> But it's not going to be fun compared to a language that has zero cost abstractions
C# has them. For instance, interfaces used as generic type constraints are zero cost.
Another thing, some C# abstractions are very low cost. Critically to this thread, Span<T> abstraction is low cost, pretty much the same thing as a pointer+length in C. It's easy to design an abstraction which uses spans of bytes backed by a memory-mapped file, and the performance going to be pretty similar to C.
> C# has them. For instance, interfaces used as generic type constraints are zero cost.
Depends on what we mean by 'zero cost'. For instance, Interface constraints themselves may not have a 'cost', but there are many cases where this means that the calls involving that generic type will be virtual (unless you're doing fun patterns like 'where TComparer : IEqualityComparer<T>,struct`). If you poke around at the internals of System.Linq you'll see there's a lot of checking to use specialized types depending on the collection in order to minimize costs.
And that's what you'll see a lot of in the .NET Standard bits; even in the past we've had some fairly low cost abstractions in places. SocketAsyncEventArgs, if a little arcane at first is a good design for it's time, and System.Linq.Expressions has been a great way for users to minimize the cost of things like reflection without having to write bytecode.
That said, some abstractions are deceptively costly; the 'new' generic constraint is definitely not zero cost, unless that got fixed in 6.0.
> unless you're doing fun patterns like 'where TComparer : IEqualityComparer<T>,struct`
These fun patterns are precisely generic type constraints I mentioned in my comment. I do use them when performance matters, here’s an open-source example: https://github.com/Const-me/Vrmac/blob/1.2/Vrmac/Draw/Main/I... That code is from a 2D vector graphics library, these interface methods may be called at 10 kHz frequency or more. Displays are often 60 Hz, the methods are called couple times for every vector path being rendered.
> If you poke around at the internals of System.Linq you'll see there's a lot of checking to use specialized types depending on the collection in order to minimize costs.
Linq is awesome, but I’m pretty sure it was designed for usability first, performance second. I tend to avoid Linq (and dynamic memory allocations in general; delegates are using the heap) on performance-critical paths. YMMV but in most of the code I write, these performance-critical paths are taking way under 50% of my code bases.
> 'new' generic constraint is definitely not zero cost
If you mean the overhead of Activator.CreateInstance<T> when generic code calls new() with the generic type, I’m not 100% certain but I think it’s fixed now. According to https://source.dot.net/, that standard library method is marked with [Intrinsic] attribute, the runtime and JIT probably have optimizations for value types.
You should read ISO/IEC 8859:2011 J.5.10 "The asm keyword". It's the same section in the C18 standard. It's the bit describing the way an ISO C certified compiler shall provide inline assembly.
The comparison is against Java because it has certain feature parity with C#. And it is right, C# code can be brought closer to C level of performance with less effort than in Java.
I watched a conference presentation by Scylla DB and a lot of the reasons given for their perf boost using C++ over Cassandra's Java seem like C# might address now in 2021. Span<T> in particular is a perf game changer for this kind of stuff.
Would be interesting if C# now would be a viable alternative to C++ for them.
Hey. ScyllaDB employee here. There are several reasons C++ was used and I don't think span ultimately matters. List from top of my head (ordered randomly):
1) we use intrusive containers, so memory managing container data structures is collocated with actual data.
2) memory allocation is not tied to GC, so we don't get pauses
3) there's almost none synchronization between different threads and there are (almost) no globals. For a story why globals are a killer for performance, read https://www.p99conf.io/2021/09/28/hunting-a-numa-performance...
4) the previous is only possible with existence of a user space scheduler which guarantees that specific threads are pinned to a single CPU. Also, there's no need to call mmap multiple times, as Seastar (concurrency framework written with Scylla in mind) allocates whole system memory and takes advantage of overcommitting in Linux. There's no syscall at memory allocation, just some userspace work and a possible page fault.
I'm not sure whether C# can do away with these problems? Let me know if you know. That being said, modern C++ is really convenient. Not anything you've seen 15 years ago in university.
As a matter of fact, I'm still at university and mine actually showed a fair bit of modern C++ (University of Warsaw here). So I don't feel ensured. As for C#, I don't claim I know stuff. I'd just like to learn something new. If you finish at an assertion like yours, sadly I don't learn anything new.
As proven by occasional threads on /r/cpp and including complaints from Bjarne himself on some of his talks, that is unfortunately not yet a common practice.
Regarding C#, if you really want to learn how to do C++ style programming in C#, have a look at the documentation regarding C# 7.0 - 7.3, C# 8, C# 9 and C# 10 regarding readonly structs, span, stackalloc in safe code, blittable types, GC free regions, malloc/free calls, allocation free memory pipelines, in and return ref types, local references, using pattern (implementing IDispose is no longer required)
Regarding classical C# (what is available until .NET Framework 4.8), you have structs, value types, manual memory management via System.Runtime.InteropServices.
> actually showed a fair bit of modern C++ (University of Warsaw here)
You mean like C++11? So C++ standard from 11 years ago ? or C++14? C++17? Last time I checked UW was like 17 years ago so maybe things changed but back then they were like 10+ years behind industry in practical terms.
As for databases, C++ has not only a performance edge over Java (and possibly .NET) but it also offers superior non-memory resource management capabilities. Databases manage a lot of resources that are not memory, and RAII is a game changer.
Your linked post is not really a good example for that — escape analysis is very finicky without language-level semantic guarantees the compiler could use. With the proposed Valhalla changes Optional will be a value-class and these optimizations become trivial.
Especially when you return a value, it is more than likely to escape.
Optional is only a part of the picture here. It also missed:
* branch elimination with cmov
* loop unrolling
* SIMD vectorization
* turning heap allocation into stack allocation
All those things could be done without breaking any semantic guarantees of Optional even without value types in place.
Also note how even forcing the Rust program to use references with double Box didn't make the code any worse. So Rust/LLVM had no issue optimizing that out even if Option was defined the way it is in Java now.
A lot of the problem stems from Java’s boxing, because the first n values are cached and so defeat escape analysis can’t remove the boxing reliably, and that cannot be fixed without breaking some applications.
Java is capable of all of these optimizations though — but I am not an OpenJDK dev so I’m getting out of my depth here.
Of course you have less time/resources during JIT compilation (and mostly, inline depth), so the quality of the resulting code can at times be vastly worse than what an AOT compiler can do, but my experience is that in real life code bases Java’s JIT compiler is really great, while this benchmark reflects on a singular case where it failed.
> Java is capable of all of these optimizations though
In theory - yes.
In my experience it just repeatedly does worse job than a C / C++ / Rust compiler, unless I'm very careful in Java coding (yes, I can often make it close, but this requires way non-idiomatic Java code; e.g. I've seen cases when manually unrolling a loop helped getting 2x more performance, which is something I don't recall ever having to do in C / C++ / Rust).
For example we don't use Java Streams in performance critical code, because everybody on the team knows it does not optimize them back to the level of simple for loops. Well, we checked many times and it simply never happened, although, theoretically it could. But I can throw a chain of map/filter/fold calls in C++ or Rust freely and it just works as fast as a hand-optimized loop, with unrolling, simd, etc.
JMH is a standard tool we use for performance comparisons.
For context, see Scala's battle with specialization to get reasonable performance of collection transformations. Once you start using lambdas to define e.g. a filter condition, and once you want generic implementations working on different item types, this pushes you into a boxing hell and the JVM is surprisingly reluctant to remove all that overhead, and you end up with >10x penalty. So instead of relying on JVM, they specialize data structures for primitive types. It is even something that you are supposed to do in Java manually (see IntStream, LongStream classes).
What matters is the time it takes from when I start a request until it completes. Some of my code isn't long running, in that case the hotspot doesn't to me anything, but it would be wrong to contrive a long running example to show that java can be faster if hotspot engages. Other process run for a long time and java may have an advantage.
Other than memory safety, simplicity, dependency management and build tooling, the .NET standard libraries, the open source library ecosystem, and so on...
All right, but apart from the sanitation, the medicine, education, wine, public order, irrigation, roads, a fresh water system, and public health, what have the Romans ever done for us?
> Not feeding the troll but outside memory safety, everything else you list exist in the C++ ecosystem with generally better alternative than in C#.
Oh? I don't think you can dispute that the C standard library is very limited and the state of dependency management / build tooling is very poor. And that actually limits the usability of the open source library ecosystem quite a lot; maybe there are more C++ libraries out there, but you can't just type what you want into the nuget search bar and get on with using it.
Simplicity is in the eye of the beholder, but the very weak semantics of C++ templates mean you can't reason compositionally about C++ code, whereas in C# it's relatively easy to have a codebase that you can reliably understand piecemeal.
> And as soon as you do touch mmap , unsafe area or native code in C# you loose memory safety too anyway.
In principle yes, but if you keep those points very rare then you can subject them to extra review etc. at a level that would be impractical with a C++ codebase (where even "a + b" is undefined behaviour in the general case). Memory safety vulnerabilities in real-world C# codebases are rare.
> I didn't spent a lot of hours in C++ world, but it never felt simple
C++ is not simple.
But presenting C# (or Java) as "simple" is equally hypocritical. The JVM or the CLR and their associated frameworks are monster of complexity, engineering and legacy that require close to an entire lifetime to be mastered entirely.
C# (or Java) are "accessible", meaning a newbie devlopper can produce something halfway baked in these languages relatively quickly.
Just because JVM or CLR are complex, then it *doesn't* mean that writting good C# / Java requires you to be proficient at CLR/JVM lvl and because of that it is hard.
>meaning a newbie devlopper can produce something halfway baked in these languages relatively quickly.
Newbie developer can produce mediocre solutions in all of those - C#, Java, C++.
The difference is that in C#/Java world it may be slow and in C++/C world it may be exploitable (more likely) <snark>.
Anyway, in my world very often it's not about internals, but about modeling skills, about OOP, testability. Those are some of the ways of measuring how good the code is.
Good system modeling skills are way above technology
How exactly are they not simple? Well, not C# because it has a problem with a bit of a feature creep similar to C++, but Java is a really tiny language compared to.. anything.
And you don’t have to be a master of the JVM because chances are you are not a gcc/clang maintainer and yet you can write performant-enough correct code.
N ways to do something, but in exchange you can get good solutions in C++. In the C# world you are locked to a medicore compiler, with a medicore package manager, a sub standard (and complicated!) build system and a unacceptable code formatter, for example.
With package manager I mean nuget. The last time I used .net (one year ago) ".Net core" was a target platform and already renamed to ".Net".
No, that's not a preference. I'm not complaining about a lack of options, I really don't care how code looks, if it all looks the same. And it fails at that. It quite often simply takes the code as it is and indents it a little bit. Clang-format (and rustfmt and dart format and plenty of others) give you the nice, tidy and homogeneous code layout i expect from a auto formatter.
iirc there were some changes around .NET Framework -> .NET Core to how does it work (where packages are stored) and that's why I said that since .NET Core I didn't have problems with it
No it isn't. With RAII, you can look where an object gets constructed and know exactly where it will be destructed. With garbage collection, you can't, and in fact there's no guarantee that it ever will be. Also, with garbage collection, you can save references to whatever you like, wherever you like, for as long as you like. With RAII, you need to make sure you don't create any dangling references or use any dangling pointers.
No, with RAII you still need to design your program around who owns each object, and thus who should clean it up. You end up with borrowing, move semantics and others. With (Tracing/Copying) Garbage Collection, none of this exists.
Not to mention, Copying GC also solves memory fragmentation, which C++ still suffers from unless you also design your allocations carefully around sizes of types.
> No, with RAII you still need to design your program around who owns each object, and thus who should clean it up
With or without RAII you should design your program around who owns each object, unless you want to end up with unmaintainable mess leaking file descriptors, network sockets, native memory buffers or trying to access resources after closing them. Which is why Cassandra and Netty implement their own reference counting.
> Not to mention, Copying GC also solves memory fragmentation
Not really. It only moves the problem elsewhere so it doesn't look like fragmentation. Compacting GC needs additional memory to have a room to allocate from, and that amount of memory is substantial unless you want to do more GC than any useful work. Also it is not free from fragmentation most of the time - the heap is defragmented only at the moment right after compaction. As soon as your program logically frees a memory region (by dropping a path to it), you have temporary fragmentation until the next GC cycle, because that region is not available for allocation immediately. And there is internal fragmentation caused by object headers needed to store marking flags for GC - which can consume a huge amount of memory if your data is divided into tiny chunks.
> which C++ still suffers from unless you also design your allocations carefully around sizes of types
Modern allocators split allocations into size buckets automatically.
> Compacting GC needs additional memory to have a room to allocate from, and that amount of memory is substantial unless you want to do more GC than any useful work.
Not in the case of a mark-compact collector, which works entirely in place, or a mark-region collector such as Immix [0], which only copies a small fraction of the heap.
> Also it is not free from fragmentation most of the time - the heap is defragmented only at the moment right after compaction.
An improvement would be to to perform more frequent "partial" collections, such as in the Train algorithm [1]. But some collectors (such as Immix again) avoid compaction until fragmentation is considered bad enough, which seems like a fair compromise.
> And there is internal fragmentation caused by object headers needed to store marking flags for GC - which can consume a huge amount of memory if your data is divided into tiny chunks.
The description of Doug Lea's allocator [2] suggests there are also "object headers" of a sort on allocated data in dlmalloc. You could probably steal mark bits from those headers, but it is commmon to use a separate marking bit/bytemap which is separate to space where objects are allocated, and thus has none of the fragmentation you describe.
> Not in the case of a mark-compact collector, which works entirely in place, or a mark-region collector such as Immix [0], which only copies a small fraction of the heap.
The mutator always allocates from a contiguous memory region. It can't allocate from the memory that was logically released, but not yet collected. So it needs more total memory than the amount of live memory in use at any time, unless you have an infinitely fast GC (which you don't have). In order to avoid too frequent GC cycles, or to allow it to run in the background, you need to make that additional amount of memory substantial.
JVM GCs typically try to keep low GC overhead (within single %), which often results in crazy high memory use, like 10x the size of the live memory set.
> but it is commmon to use a separate marking bit/bytemap
Sure, you can place it wherever you wish, but it still requires additional space.
Your comparison would only be fair if the alternative (malloc/object pool) would not have more memory than strictly necessary either.
But malloc and friends usually do what a very basic GC would (make separate pools for differently sized “objects”).
While object pools also need much more memory unless it is full.
So all in all, GCs do trade off more memory for more efficient allocation/deallocation but that is a conscious (and sane) tradeoff to make for like 99% of applications as memory stored on RAM doesn’t consume much energy compared to doing GC cycles like a mad man. Also, it is quite configurable in case of JVM GCs.
The only overhead memory used by a pool allocator is the rounding to the page size. The difference from a compacting GC is that a pool allocator can allocate from the freed memory immediately after the memory was freed. So the overhead does not depend on the allocation rate, it is just a tiny constant factor.
As for the energy efficiency, I seriously doubt that bringing all memory into cache once in a while, including memory that is not needed frequently by the application, only in order to find live vs dead memory is all that energy efficient. The allocation itself is indeed typically slightly faster but the marking and compaction is additional cost you don't have to pay in manual memory management.
Hence why I'd suggest using partial GCs like the Train, as that would have better locality of reference almost all the time. A generational GC could have similar effects, but nurseries seem to be much larger than caches nowadays, with few exceptions.
Partial, generational or region based GCs still need to scan the whole heap from time to time. By bringing stuff once a while into cache they also push stuff that's actively used out of cache. Those effects are typically not visible in tiny benchmarks that allocate temporary garbage in a loop, but can get pretty nasty in real apps. LRU-cache-like memory use patterns are particularly terrible for generational GCs - because the generational hypothesis does not hold (objects die old).
Also using generational algorithms does not remove the dependency of the memory overhead on the allocation rate. Those techniques improve the constant factor, but it is still an O(N) relationship, vs O(1) for a manual allocator. If the allocation rate is too high there are basically two solutions: (1) waste more memory (use very big nurseries, oversize the heap) or (2) slow down / pause the mutator.
The industry seems to prefer (1) so that probably explains why I never see Java apps using <100 MB of RAM, which is pretty standard for many C, C++ or Rust apps; and 50x-100x memory use differences between apps doing a similar thing are not that uncommon.
> By bringing stuff once a while into cache they also push stuff that's actively used out of cache.
I may very well be wrong, but I don’t think it is any worse than the occasional OS scheduling/syscall, etc. GCs happen very rarely (unless of course someone trashes the GC by allocating in hot loops)
Also, while a destructor is indeed O(n) it is a cost that has to be paid on the given thread, while GCs can amortize it to a separate thread.
Fortunately, with GC, you can avoid thinking about many small objects you constantly allocate along the way. Most of them will get collected the next GC run as a young generation going out of function / block scope. Some of them will travel down the call graph and may end up long-living, then eventually collected.
But I agree: for anything that you want to deallocate deterministically, or at least soon enough, you need to track ownership, and care about the lifetimes. Such objects are relatively few, though.
> Most of them will get collected the next GC run as a young generation going out of function / block scope.
Depends on the use case. Not if you're storing them in a long living collection.
Also heap allocation is costly, even in languages with fast heap allocation. It is still an order of magnitude slower than stack allocation.
> But I agree: for anything that you want to deallocate deterministically, or at least soon enough, you need to track ownership, and care about the lifetimes
It is not only that.
You need ownership not only to determine lifetimes.
You need to know it in order to be able to tell if, having a reference to an object, you're allowed to update it and in what way. Is it the only reference? If it is shared, who also has it and what can it do with it? If I call "foo" on it, will I cause a "problem at a distance" for another shareholder? Being able to answer such questions directly by looking at the code makes it way easier to navigate in a big project written by other people.
In C++ if I can see a simple value or a value wrapped in a unique_ptr, I know that I can update it safely and nothing else holds a reference. If I see a shared_ptr, I can expect it is shared, so I have been warned. The intent is clear. In Rust it is even safer, because the compiler enforces that what I see is really what I get (it is not just relying on conventions).
On the flip side, GC-based languages tend to invite a style of coding where reference aliasing is everywhere and there are no clear ownerships. I can see a reference to something and I have no idea what kind of reference it is and what I can safely do with it. It is just like a C pointer. I need to rely on code comments which could be wrong (or read a million lines of code).
That’s what OOP should handle though. You shouldn’t let internal objects escape if it is not intended.
Don’t get me wrong, I really like RAII or Rust’s compiler enforced ownership model but it doesn’t solve everything. Eg. it only disallows data races not race conditions.
Also, immutability goes a long way toward solving all that.
I meant tracing garbage collection. I'd say that something like 95% of allocations in real-world code can be done straightforwardly with RAII, or could be if the language supported it (and indeed gain maintainability benefits from being forced into an RAII-centric paradigm). But the remaining 5% is a real pain, and distributed over a wide variety of problems in a wide variety of domains. So tracing GC really does make life a lot easier, if you can afford it.
The freedom to reference anything easily from any place is a double edge sword. I agree it makes 5% of hard issues go away, but on the flip side it makes the other 95% more complex. Tracing GC is a "goto" of memory management. You may argue goto is a good thing because it offers you freedom to jump from anywhere to anywhere and you're not tied to constraints enforced by loops and functions. We all know this is not the case. Similarly being able to make a reference from anywhere to anywhere leads to programs that are hard to reason about. We should optimize for readability not the ease of writing.
There is no reason why you could not, in principle, have Rust-style compile-time borrow checking in a managed language.
As an extreme example (that I have occasionally thought about doing though probably won't), you could fork TypeScript and add ownership and lifetime and inherited-mutability annotations to it, and have the compiler enforce single-ownership and shared-xor-mutable except in code that has specifically opted out of this. As with existing features of TypeScript's type system, this wouldn't affect the emitted code at all—heap allocations would still be freed nondeterministically by the tracing GC at runtime, not necessarily at the particular point in the program where they stop being used—but you'd get the maintainability benefits of not allowing unrestricted aliasing.
(Since you wouldn't have destructors, you might need to use linear instead of affine types, to ensure that programmers can't forget to call a resource object's cleanup method when they're done with it. Alternatively, you could require https://github.com/tc39/proposal-explicit-resource-managemen... to be used, once that gets added to JavaScript.)
> Which had nothing to do with Java or how it manages memory. You could have the same vuln in NodeJS or Python.
I think parent was pointing out that the biggest and costliest security exploit ever found had nothing to do with buffer overflows, memory management, etc.
It seems almost impossible to say if log4shell was bigger or more costly than Heartbleed or the Debian OpenSSL bug (there are probably still keys out there made with the damaged randomness). Log4shell is just in recent memory.
> It seems almost impossible to say if log4shell was bigger or more costly than Heartbleed or the Debian OpenSSL bug (there are probably still keys out there made with the damaged randomness). Log4shell is just in recent memory.
Seems pretty clear to me - Heartbleed (and all the other serious memory exploits) required a great deal of skill and a lot of luck to exploit, and in return you either don't get a remote execution, or you get a very tiny chance of a remote execution.
In comparison, log4j is about as easy to exploit into an RCE as it is to use curl.
Log4j is a guaranteed remote-execution exploit just by filling in a user facing form with the correct URL, while memory exploits are not guaranteed to result in an RCE, requires more skill than simply typing into an input box or an email.
> Log4j is a guaranteed remote-execution exploit just by filling in a user facing form with the correct URL
This is only true of systems that were using very outdated JVM versions... on the newer ones (we're talking 2016 or newer JDK releases, not like last month), you would need to pull off a serialization exploit to indirectly get RCE, which is quite a bit harder than sending a HTTP request.
> Heartbleed (and all the other serious memory exploits) required a great deal of skill and a lot of luck to exploit, and in return you either don't get a remote execution, or you get a very tiny chance of a remote execution.
Heartbleed wasn't about RCE at all. It was about memory disclosure -- memory that contained secret signing keys. The fallout was that keys needed to be revoked and rotated.
Reading out memory and extracting the secret keys was actually pretty simple. There were multiple POCs available.
"The root cause of the Spectre and Meltdown vulnerabilities was that processor architects were trying to build not just fast processors, but fast processors that expose the same abstract machine as a PDP-11. This is essential because it allows C programmers to continue in the belief that their language is close to the underlying hardware."
Just because an academicist has an opinion does not make it fact. That quote reads like a blogpost and with its lack of citation it might as well be. Yes, it sounds plausible. but well.
Let's assume it's true: your argument would be "intel made a mistake, but since they would have only made that mistake when doing stuff that appeases C programmers (would they have?), it's actually because of C."
Now, I think this is a bit of a stretch.
ETA: or did you mean that it has to do with C for that reason? in which case, ok, I see how you mean.
Oh yea, if not for emulating the PDP-11, processor designers would have no interest in instruction level parallelism.
This article is pretty funny actually:
> On a modern high-end core, the register rename engine is one of the largest consumers of die area and power. To make matters worse, it cannot be turned off or power gated while any instructions are running
Yea, let's just gate off the RAT. What's it for again?
Java exposed the same abstract machine as a PDP-11 too. The key "PDP-11" thing is that all memory is treated as equally accessible, rather than the reality - i.e. that some memory is in caches on certain cores only and can therefore be accessed more efficiently on those cores.
How is it the biggest and costliest security exploit ever? With even the most basic of firewalls a server should absolutely have it is not really exploitable. I’m not trying to downplay it, but I really don’t see how is it even remotely close to some ssh bugs.
No, it's not. I can easily configure open ports using firewall-cmd. That is basic security. And there's no dedicated options to configure outgoing calls, there's no sane defaults to start with (I have no idea which targets should be whitelisted: ntp? update servers? anything else?), there's no system-wide integration, like my dnf can choose different mirror every time it runs.
Of course it makes sense to configure outbound white list, but there's no infrastructure in RHEL or Ubuntu and nobody's going to bother with custom scripts for that.
The point is that not every security problem stems from the memory model, and myopically focusing on memory safety evidently doesn't stop do much to prevent vulnerabilities.
According to Microsoft's data, about 70% of security vulnerabilities are memory safety bugs. So definitely not all, but taking them off the table makes a big difference.
Another big chunk of bugs, including forgetting to escape strings, can often be reduced by building strongly-typed APIs that distinguish between "String", "Sql" and "Html" types.
Java actually does quite well by these metrics. It's memory safe, tries to eliminate undefined behaviors, and it has an adequate type system. However, the mere existence of runtime code-loading is a risk, as we saw with Log4j.
I admit that I like C, but I would use it only sparingly (if at all) professionally at this point despite expert-level C experience and skill. However, that 70% figure is deeply saddening to someone coming at this from a C perspective - C is _dire_ in safety terms and I have personally found and fixed a large number of C bugs. The idea that it's _only_ 70% is pretty sad, because it means we are well and truly doomed.
I write this tongue in cheek, obviously. I've seen dire, dire security bugs in Java in particular but also in terms of fitting parts together (lo, broken ACLs, useless AWS SGs, inter-process assumptions that don't hold, injection vulnerabilities of myriad types, etc.). The truth is, we are doomed, and not just because of the 70%.
I'm not convinced this statistic is saying more than that these bugs are easily identifiable. There is a lot of tooling for identifying memory errors, and virtually none that could identify something like log4shell.
they are more easily identifiable, but they are also very simple. when writing C, you probably write a potential memory safety bug roughly every 100 lines. Even if you detect 99% with automated tooling that's still a pretty sizable attack surface.
every array access and string comparison is a potential vulnerability in C. I'm not saying that in practice, all of them will be vulnerable, just that every one of those is a place where you could miss a check and end up with a memory bug.
That's kind of a disingenuous way to reason about it. You absolutely can bounds check your memory accesses in C, and bounds-checked accesses are as safe in C as they are in any other language.
the point isn't that it's impossible to write C that's safe, it's that doing so requires 100% success rate of a human implementing something correctly. furthermore, the main cases where C can get a performance benefit over safer languages are where there is a complicated invariant that ensures safety. the problem is that these are incredibly easy to break during refactoring, or when a different dev modifies the code later. compilers are much better then humans at verifying that code is correct.
It has everything to do with Java, though not memory. It exploits Java's write-once-run-everywhere feature as well as the decision to initialize classes, including running code, on load rather than on first instantiation.
Which argues that maybe language choice just doesn't matter that much for security. It's true that C allows a class of mistakes that don't exist in higher level runtimes. It's equally true that this class of mistakes represents an increasingly vanishing share of real world exploits. Static analysis tools and runtime hardening techniques don't "fix" C, exactly. But in practice they work well enough to push C's foibles down into the noise floor.
But at the same time, C remains, and will probably always remain, the easiest language on which to tune and optimize. It's not going anywhere. Our grandkids will still be using systems with C firmware at their core.
> Which argues that maybe language choice just doesn't matter that much for security
No it doesn't. There is actually no logical way to infer from the parent statement that language choice doesn't matter for security.
> But in practice they work well enough to push C's foibles down into the noise floor.
This goes against findings from research that has been done into the sources of security vulnerabilities. Microsoft and the chrome dev teams have published that 70% of their security bugs are a result of memory safety.
If C is used in 100 years it will only be because of inertia. Today there are better choices in the domain of low level systems programming languages.
> If C is used in 100 years it will only be because of inertia. Today there are better choices in the domain of low level systems programming languages.
I'm not saying you're wrong, but there's effectively zero real kernel work done in anything other than c and c++. Tons of well-known open source Rust/microkernel stuff exists here in public, but behind closed doors is where almost all firmware work is happening. When someone at Microsoft, Apple, Qualcomm, Samsung, etc sits down to code firmware for billions of devices, it happens in c. I've never seen a serious proposal to switch to managed code at any of my jobs, either
I think we'll see more and more complex stuff move out of the kernel, but I don't think c is going the way of COBOL in the next 20 years at least. I'M definitely not going to start using Rust on my own, and it would take a pretty compelling case from management or a junior engineer to make me switch in the future.
Web browsers are one of the most poorly designed applications in existence. It's not surprising that such complex applications trying to do everything possible have vulnerabilities. But in no way that should serve as a benchmark for C as being inherently unsafe when far more important systems like databases, operating systems, system libraries are written in c just fine. Most of the common vulnerabilities in general are due to unnecessary complexity of systems that lend themselves to poor programming practices, configuration errors, low calibre programmers, etc (OWASP list for example). Memory safety vulnerabilities tend to just get more attention since they are in critical parts of a system, hard to exploit and so attracts highly sophisticated exploits that affect us at a nation state or industry level. If these systems were written in high languages by those high level developers, I'd go nowhere near any computer system.
> Which argues that maybe language choice just doesn't matter that much for security.
I think it just means that these languages all have elevated potential for security issues, but there are languages without pervasive gratuitous dynamism or memory problems.
> Static analysis tools and runtime hardening techniques don't "fix" C, exactly. But in practice they work well enough to push C's foibles down into the noise floor.
But that’s all additional effort to integrate these tools and practices onto a language which already has a very low iteration velocity (all of the time spent debugging memory issues, package issues, build system issues, etc which simply don’t exist in many modern languages).
> But at the same time, C remains, and will probably always remain, the easiest language on which to tune and optimize. It's not going anywhere. Our grandkids will still be using systems with C firmware at their core.
This sounds like a concession to me. Of course C will smolder on in obscure, legacy firmware long after it becomes obscure—so did COBOL, but we don’t pretend COBOL’s vestigial existence is owed to its merits rather than a quirk of history.
This is a fine and normal thing. C did it’s job for a time, but languages aren’t emerging which are better suited to modern computing requirements. This process will continue and these languages which are chipping away at C’s market share will be eroded themselves eventually.
> we don’t pretend COBOL’s vestigial existence is owed to its merits rather than a quirk of history.
Um. COBOL survived so long specifically because it did some things better than alternatives, mostly around how it handled numbers. Yes, also inertia and historical accident, but also because it was actually good at its job.
It beat out others of its day on merit, as did C, but we’re talking about C and COBOL competing against modern languages in a modern landscape. In other words, COBOL’s dominance in the 60s was due to merit, but it’s vestigial existence today is a historical artifact—it isn’t simply the best language for the application.
I guarantee Java would be vulnerable to the same category of errors even without runtime class loaders. Java puts a lot of emphasis on dependency injection, and has done this for a fairly long time. This takes the form of having classes pull dependencies themselves through some central registry over explicit construction.
It's arguably a symptom of a larger problem the ecosystem's sheer size.
Dynamic linking is not a language feature, it’s a feature of the operating system. If we’re talking about dynamic loading, there are plenty of languages that don’t support this natively, but only through its C bindings (e.g. Haskell).
In what sense? In many popular languages (Perl/Python/Ruby/...) all code loading is dynamic. Java does have more of a built in RMI framework than most languages, but it's rarely used in modern code.
No, JNDI is an API for accessing remote code. It stands for Java Naming and Directory Interface. Not supporting LDAP, the most common remote directory, would have been a flaw.
The flaw was in log4j's use of the library which allowed remotely connecting and executing code by default based on connection strings which were possibly user-entered.
There are other factors. Not every application of C has the same amount of exposure to security vulnerabilities.
On Github I see people rewriting simple UNIX utilties, e.g, cat, in some "safe" language, e.g., Rust. Yes, it is possible there could be some exploit based on an error in cat, maybe triggered by some malicious input that the user fails to detect, but this would not be a concern to the same degreee as errors in large, complex applications with network and other privileged access that are relatively new. For example, using C to write cat versus using C to write git.
Exposure to security vulnerabilities depends not only on the language used, but also when, where and how the application is used.
Security is only one reason to rewrite utilities in another language. In particular for these foundational utilities, it means more of your dependency tree can be reproducibly built without the help of specialist package maintainers. Other reasons also include “fun” and “learning”.
Considering that package mgmt is a well solved problem, rewriting these tools for that in return for slow, bloated alternative sounds like a huge tradeoff to me. Fun and learning on the other hand, sure.
AFAIK operating systems delegate reads/writes to their IO subsystem. This subsystem can optimize the IO operations by scheduling them in an order that differs from when they arrived to the IO queue and this improves performance in some cases. Instead of FIFO the subsystem operates on the segment of your storage device that is closest to its current location. Modern IO subsystems are both thread-safe and smarter than the average programmer.
I'm not so sure that package management is really a well solved problem. Sure, there are some solutions, but as far as I know all leave some things to be desired.
From my understanding, Rust performs more compile time checks for safety. In those cases, you may be able to maintain the performance of C without sacrificing safety. On the other hand, you may still get away with writing more performant C code since C allows the developer to play fast and loose with the rules. I'm not saying that is a good thing, but git can get away with it since it since it probably attracts very competent developers.
The other thing mentioned in the post was eliminating overhead. My limited experience with Rust leads me to believe it is closer to C++ than C in this respect. That is to say there is a lot of higher level functionality that you can use and it is optimized for performance in the general case. On the other hand, that functionality is still for the general case, so a competent developer who has a thorough knowledge of what the program is trying to accomplish will be able to write more performant code.
That C allows developers to play fast and loose is actually one of the reasons why C is slower than Fortran in many cases. Aliasing does not exist in Fortran, and array dimensions are well-defined, and this alone allows for much more aggressive optimizations, impossible to do safely in C.
(Is Fortran still relevant? Try compiling Tensorflow, to say nothing of serious numerical modeling stuff. And yes, I know about __restrict; it's more of a band-aid.)
Ha, I have some disgusting use of COMMON blocks from the 80's to show you. But that doesn't affect the optimizer's ability to assume that there's no aliasing, so you're totally correct about that.
There are some architectures that C supports better than Rust due to having gcc's backends as an option as well as LLVM, but within a given architecture that both support, I don't think there is.
Only thing that comes to mind immediately is VLA/alloca but I'd have to check. Most of the gnarly pointer things are fair game and work just as well as their C counterparts.
You're mixing up a thing you can do with a way that you can do something. You can write Rust code with the same observable behavior as Duff's device in C.
A good example of cool tricks to impress fellow hackers that have no place in code bases that should meet certain quality levels of long term maintenance.
Ownership rules still apply _to references_, but not to raw pointers.
This program prints "12". The only difficulty is that Rust's designers have intentionally made this pattern difficult by marking raw pointers as !Send, but you can get around that by wrapping them in a Send type.
use std::{time::Duration, ops::{DerefMut, Deref}};
struct SendWrapper<T>(*mut T);
impl<T> Deref for SendWrapper<T> {
type Target = T;
fn deref(&self) -> &Self::Target {
unsafe { self.0.as_ref() }.unwrap()
}
}
impl<T> DerefMut for SendWrapper<T> {
fn deref_mut(&mut self) -> &mut Self::Target {
unsafe { self.0.as_mut() }.unwrap()
}
}
unsafe impl<T> Send for SendWrapper<T> {}
fn main() {
let mut x = 10;
let p1: *mut usize = &mut x;
let p2: *mut usize = &mut x;
let mut s1 = SendWrapper(p1);
let mut s2 = SendWrapper(p2);
let j1 = std::thread::spawn(move || {
*s1.deref_mut() += 1;
});
let j2 = std::thread::spawn(move || {
std::thread::sleep(Duration::from_secs(1));
*s2.deref_mut() += 1;
});
j1.join().unwrap();
j2.join().unwrap();
println!("x: {}", x);
}
> On the other hand, you may still get away with writing more performant C code since C allows the developer to play fast and loose with the rules.
I have some thoughts on how Rust performance compares with C!
I spent yesterday optimizing a sort routine in Rust. I need to sort records where the key length is known at compile-time, and the payload length is different every time sort is called. I expect the average sort call to involve at least 250 million rows and over 400 columns. Speed matters.
Here's what I've learned so far, about how Rust versus C question:
- Rust has generics with monomorphization. This makes it easy to compile two different versions of the sort routine to run on different length keys.
- The very fastest version of my recursive base case, a 20-item insertion sort, currently uses unsafe code. This allows me to eliminate a couple of bounds checks that LLVM isn't eliminating automatically. I may figure out how to do this with safe code before shipping. (In 5+ years of production Rust, I've never needed unsafe for performance. This may be the first time!)
- Performance often comes down to cache locality. Both C and Rust give me the fine-grained memory layout control that's essential. Also, this means that "detached key" sorts are almost always a bad choice, even when the alternative is repeatly moving large record payloads around.
- The Rust "criterion" benchmarking library makes it super-easy to run valid performance tests.
- The Rust "proptest" allows me to generate large amounts of test data and verify properties like "reckless_sort always produces output matching std sort."
- "cargo fuzz" will be useful when I try to break the finished code. Seriously, it's a super-nice fuzzer workflow.
- Rust's standard quicksort contains all sorts of funky optimizations: It detects semi-sorted arrays, it breaks patterns, it falls back to merge sort when things go wrong. It even marks "cold" paths for LLVM to improve instruction cache usage.
- If I get my sort working, the "rayon" library will make it utterly painless and safe to split up the recursive calls over multiple CPUs.
Overall, this has been a pretty fun experience. I'm not sure whether the finished product will use unsafe Rust in the inner loop. But the combination of generics, rayon, and tightly integrated test/bench/fuzz tooling has made this a very pleasant experience.
Verdict: I am entirely happy using Rust for hot loops. (Even if my current fastest inner loop does use "unsafe" to bypass a few bounds checks I'm not clever enough to eliminate otherwise.)
C can be written in a way to minimize vulnerabilities. You use things like critical thinking and sanitize/validate inputs to functions and before you call libraries.
The log4j vulnerabilities and others like it where input validation specific to what you expect would not happen in properly paranoid systems.
There are approximately zero codebases of nontrivial size that are written in memory unsafe languages and free of serious security holes that would be impossible in safer languages. That is pretty good evidence that the measures you suggest are not very effective.
Wasn’t that generated from Haskell source rather than handwritten, after years of formal verification of not only the source but also the C code generator and the final machine code?
afaik, the c is handwritten for performance reasons, and is manually proven to be a refinement of the executable haskell model. the haskell is then proven to be a refinement of an abstract, more succinct spec.
With endless bug-hunting in hastily-written C code, it could be fewer than 4 lines a day of completed code. (I hope nobody actually does so poorly, except while learning C.)
seL4 was released as open-source in 2014. I know of the helicopter drone; have any other projects made use of seL4? If not, why not? My naive guess is that it's too hard to build on the project without introducing vulnerabilities and therefore negating the benefits of the kernel, but I'd love to be shown otherwise.
it's kinda funny really. what you gain by eliminating memory bugs you lose in boundless complexity sprawl that is basically encouraged by the design of the language and its major libraries.
The complexity isn’t new, it’s just moved from manually managing dependencies or reinventing wheels into more fruitful, higher level applications. Of course, some of these are frivolous, but probably no worse than spending one’s complexity budget wading through dependency hell.
There is a single memory error CVE there, the last one.
Assuming a higher level language would not allow that without introducing any other security problems, and assuming the CVEs roughly represent the secureness of the software, then using a higher level language would be a pretty small marginal improvement in security.
Have any of the relevant limitations been removed in the last decade+? I don’t believe so.
Project Valhalla is 9 years on, is there an end in sight?
When Java & the JVM were first released, the treatment of primitive vs. what we now call “boxed” types received a lot of criticism, mostly of an aesthetic variety. In retrospect, I wonder how much bigger the Java and JVM ecosystem would be if user-defined, non-reference types had been there from v1.0.
https://news.ycombinator.com/item?id=29666279 confirms that they're actually working on it now but it's such a monumental effort. A few years wasted before JDK8 were very damaging. That's what made people look for alternatives to the JVM, not its treatment of primitives.
If Loom and Valhalla come to fruition in a couple of years the JVM would be competitive against golang. But the problems golang has to catch up with Java (e.g. error processing and migrating the standard library to generics) seem easier to fix.
A big ecosystem isn't necessarily good. Although big is relative, it can mean "mature, well-known, multi-featured, well-supported, well-documented", it can also mean "Complicated, hard-to-reason, hard-to-pick a library, hard to know which of the many approaches is correct".
In general, I would argue that most frameworks start pretty sensible and consistent and are much easier to use and they get worse over time as they try to be all things to all people and get spread too thinly.
Well, as per the often referenced ‘No silver bullets’ paper, the only way to significantly improve productivity is to reuse code — no managed language provides an order of magnitude productivity increase over another. So I have to disagree, the ecosystem (which in the concrete JVM case has multiple competing solutions to most problems, with quite good quality) is like the most important factor.
one that might is Julia. getting a like for like comparison is hard, but DifferentialEquations.jl is roughly 10x smaller than petc/odeint, faster, and more fully featured.
I don't think this is a meaningful limitation any more. Go's ecosystem is big enough - I've been using it for a few years now and have only found a few esoteric things that aren't supported (and nothing big enough to be a showstopper).
The quality is also really good; I've been impressed by many of the libraries we've picked up. Especially in light of vulnerabilities like log4j or that similar Struts one Equifax had a few years back, I'd much rather be working in the Go ecosystem than Java's.
There are many niches where Java & the JVM are absent or fighting an uphill battle to occupy at all. Of those I’m aware of, the only ones where value types wouldn’t help fitness tremendously are on the client side, already a lost cause.
What would those niches be? Non-openjdk JVMs occupy even ultra-niche use cases like hard real time, military usage, embedded (though arguably Java ME is just a subset of Java, so you may or may not consider it the same), and the other direction of crazy performant server machines is well-served by OpenJDK — apparent in how is it used on a very very huge percentage of all web servers with top companies relying on it almost exclusively (apple’s servers are mostly Java, google has plenty of java applications, alibaba is a java webshop, twitter runs scala with graal, etc). Just an interesting note, OpenJDK’s GCs can handle heap sizes well into multiTERAbyte ranges.
CPUs today have more instructions-per-clock, so things like the extra "& 0xFF" might not affect performance at all. Similar for conversions to/from types: Those are compute on data already in cache/registers, so incur extra instructions but not extra memory fetches and might be free(depending on iop cache).
Boxing is still bad, because it often means more uncached memory fetches. Container types that hold references instead of values are also still bad. I notice many higher-level languages are putting more emphasis on adding value-types and unboxed options recently, because memory indirection is a comparatively worse problem for them than it used to be.
Memory is faster, caches are larger, so things that compute on bulk bytestream reads(initial loading of program code, reading large packed .git directories, etc.) should be more consistent across languages compared to 2009. Back then, it was easier for language runtime code(GC, JIT, etc.) to push data out of cache and trigger more memory fetches.
I haven't benchmarked JGit or git, these are just my prior assumptions.
> That could have been an actually useful instruction
Only if the data dependency graph in broad enough. There are certain code patterns that are clock-cycle-dependent and run with similar performance on older CPUs and newer ones, but they're not common.
Something like git is used broadly enough, there's probably something in some other process to do(like render ads in your browser), but looking strictly at git I would guess it's difficult to fill an entire CPU pipeline with no gaps.
I think Prime95 is optimized enough to do that in one process, and Intel recommends against running it because it will damage the CPU. Usually code gets blocked on a memory fetch or something before it gets to that point - code is never that well optimized.
> Intel recommends against running it because it will damage the CPU
Citation? A cursory search suggests this is a fable.
And your argument boils down to, (or it seems to), the fact that there is always an imperfection so might as well not try/care about other imperfections. Which I kind of understand, but I don't think it's a good mindset. If we didn't have to waste cycles on stuff like "&255" because of language or other self-inflicted limitations, no matter how fast those cycles are completed, we'd have faster software.
The description only confirms a "hang", not permanent damage. And the Errata SKL082 says
Under complex microarchitecture conditions, processor may hang with an internal timeout error (MCACOD 0400H) logged into IA32_MCi_STATUS or cause
unpredictable system behavior
so it sounds like very optimized code with AVX instructions may have caused some internal components to violate their clock-SLA because it can't run them that quickly. So "CPU damage" is overstating it, and I don't keep up-to-date with CPU bugs to see if there's been persistent problems in this area.
> no matter how fast those cycles are completed, we'd have faster software.
I think "speed" is not a good way to think about CPU instructions. I usually think of "resource pressure":
If a function uses many CPU registers at once, it puts more pressure on the 16 named registers within one process.
If there's lots of random memory fetches everywhere, it pressures the data dependency graph to be broader to make the same progress.
If there's lots of arithmetic operations, those consume ALUs.
Extra instructions consume memory bandwidth, decoder queue spots, and iop cache.
etc.
and it only impacts speed if you hit a threshold and something has to block waiting to obtain one of these. So while it's good to remove pressure if there's no drawback, it won't necessarily translate into a direct speed improvement unless you were blocked on that specific resource.
It sounds pedantic, but CPUs nowadays have so many resources available that you can be quite wasteful and still not have any slowdown. It's not good for software developers to use the word "faster" to describe "release of pressure on resource X", which is a better approximation.
In the specific case of "& 0xFF", it's a clear win. But Java has many other benefits like profile-guided JIT and a GC with different memory pressure than malloc(). If you only think in terms of "faster" or "slower", you won't know how to aggregate all the things that Java and C do to come to a conclusion.
C is well-known for having poor performance because of pointer aliasing ruining optimizations. "Optimizer pressure" can be another idea, making the optimizer work harder to prove some code can be simplified until eventually it hits a threshold and can't. Java doesn't have unrestricted pointers so the references are easier to track and prove correct.
Some interesting points, thank you. I guess if I wanted a final answer I'd need to look into the specification sheet for a specific CPU to see to what extent it can parallelize and or reorder instructions or perhaps even break them down into "micro-ops". And then a new CPU comes out and I'd have to do it again. Maybe, for a higher level programmer (I'd call myself that), this is indeed a waste of time for very little gain.
HotSpot usually optimizes the signed-to-unsigned conversion away. This is quite evident when looking at the generated machine code when accessing byte arrays. Although I still find it quite silly that the original post is even considering this an important point when the biggest bottleneck for git is the file system.
Actually the &0xff will depend on the previous computation, so won't be executed in parallel and while very cheap will increase the latency of the operation that might be in a critical chain.
Are you claiming that because modern CPUs have higher instructions per clock count means you can insert useless instructions without performance impact? That's ridiculous.
JGit has advanced a lot, and is in use in software handling huge repositories like Gerrit code review system. Java has also had numerous advancements in the past 10 years.
It would be interesting to measure git vs JGit performance today.
It would also be interesting to have JGit developers comment on performance after all this time.
There's a thrill to coding in C which is similar to using a great chef's knife. I'm no master with either, will probably hurt myself -- but the limitation is me and my own.
My opinion evolved over the years to the opposite of this, and I really loved C in the '90s.
C is like the old sawing machines. Hospitals saw a never ending stream of cut off fingers, and every recent model has safety features. Old timers claim you are a responsible adult human and should just watch out, and technically, they are correct. They hate he safety features which indeed make their job harder.
But it is humanly impossible to be 100% responsible at scale. You will have a bad day someday or be sick or whatever, and it will cost you a finger. In a team, 2 people will differ in what 'responsible' exactly means, and a dumb error will eat another finger and cause a lot of angry fingerpointing.
C is the same. A group of people, at scale, can't write good enough software all the time. Dumb tiny errors will sneak trough, and C will punish us with instability or security holes. If we want to build any trust in future software, if our profession wants to be engineers instead of hackers, we will require some safety features in our tools. Even if this costs us short term in some extra development time or runtime inefficiency.
Definitely agree and even more so today, these systems run everything, back when I started programming, most banks didn't even have online accounts so you mostly got to choose whether you used an app. Nowadays, you are using them whether you like it or not and the developer market, although still tight, is much more prolific now than before meaning there are plenty or average or not-so-good devs now compared to back when it was a niche thing for the cleverest (or most careful?) people.
That begs the immediate question: Why would anybody continue to hire Java developers?
I was thinking about that recently and two things came to mind:
* The JavaScript ecosystem is incredibly immature. You can deliver some amazingly supreme trash in this language and it will likely execute, most of the time. Nearly everybody is deathly afraid of the platform (the DOM) and requires the worlds largest frameworks merely to put text on a webpage, even though familiarity with the DOM is understanding just a few methods.
* Most Java developers are trained by schools.
After thinking through those two points I have to realize employers would rather burn money than train developers or pray and hope that with enough open source tools dependencies will compensate for the maturity gap.
I had a look at a couple of the Java benchmarks you've linked to and they are horribly written, completely disregarding stuff like OSR, forgetting about the warm up and ridden with absolutely unnecessary allocations.
Either someone did this on purpose or they have no idea how to write efficient Java code and benchmark it.
The benchmark game is always full of these things. Some people are very motivated to make their language shine on it, so it has good[1] implementations, almost everything else is bad. Really, really bad. It's best to just ignore it.
[1] With good defined as fast. Many of the implementations there are so contorted that no one in their right mind would use anything like it in production. But they are fast.
Yep. There were two implementations of whatever in Ada, but one of them failed to print the right output, so was disregarded in favor of the slower, but correct one. I fixed the implementation and now Ada is way, way faster. If I remember correctly I actually had the conversation with whoever is responsible for that website here on HN, and I posted the correct code here (via a link).
Are you asking specifically about what I had to do to fix the faster, but previously incorrect implementation? It is on HN somewhere. Other than that, there is no problem. Perhaps people who took the site too seriously may have gotten the wrong idea about the language's performance.
You answered "Yep" to someone's comment that the benchmarks game was "Really, really bad." and that didn't really seem to be the experience you described?
> Some people are very motivated to make their language shine on it, so it has good[1] implementations,
I said "Yep" as in "Yep, some people do contribute", and it has good implementations for some, while shitty ones for others, and it should not be taken way too seriously because of what he just said.
Yup, enough just to give me a little nudge towards the direction I was already leaning! If my favorite language is slow, I will not take it seriously at all. :)
Not warming up hotspot seems to be a constant in benchmarks for people comparing languages with Java. I mean, it's probably valid if your use case is a CLI tool (although, in that case an AOT compiler is clearly a better option)
Thanks for that. Jvm startup time used to be an issue which doesn't appear to exist anymore given those numbers - although as that reference said, it would still matter for something that is finished very quickly.
TIL
If I'm not mistaken the hallmark of benchmark game is to use same looking source code and see how fast it runs written in different languages. Basically, it presumes that all languages follow the same programming paradigms.
"So we accept something intermediate between chaos and rigidity — enough flex & slop & play to allow for Haskell programs that are not just mechanically translated from Fortran; enough similarity in the basic workloads & tested results."
I love how you insist on using the "vastly superior" phrase to mock something's you haven't even seen but have strong opinions about right off the bat.
Not taking the bait, anyone who can read and comprehend Java code will tell you the same. What I refer to is a standard way of approaching Java code benchmarking.
Apart from the questionable code quality of the benchmarks: JavaScript can be impressively fast in very small benchmarks, but things change very quickly with real applications.
Java has way more information, thanks to the type system, and can do a much better job with optimizations.
Interesting historical perspective. I heard that modern java addressed some of the shortcomings mentioned in the post: added type similar to c struct and unsigned types.
Anyone knows if a git (partial) rewrite has been attempted in Rust yet?
Might be worth a shot — should be similar performance with stronger guarantees, and last I checked git was a cesspool with bash and Perl mixed in all over the place.
Highest performing JVM's available are written in C. So, if you can get something to perform well in Java, you can do it in C too: just ship the Java code embedded in the JVM.
I've had this argument a few times before. My company is full of javaheads and kept claiming that Java could be faster than C for some specific workloads.
My argument has always been and still is...a language x written in language y cannot by definition be faster than language y.
I understand that languages by themselves aren't fast, but it seems that point should be clear.
> language x written in language y cannot by definition be faster than language y
That’s just.. false. Languages compile to machine code, one way or another. Java’s JIT compiler doesn’t create C code, it creates machine code. So it could very well be written in brainfuck, it could still be theoretically better than C.
Agreed. As an existence proof, the optimising compiler for language X could realise that a for() loop that uses 95% of the actual runtime CPU has no side effects and can be removed. There may be some feature or property of language Y that makes that analysis impossible.
In the Java case, the JIT performs a runtime analysis. Without wanting to making any assertions about overall performance here, there exists a non-zero set of programs for which runtime optimisation is superior to static compiler analysis.
I can second this. C# has had AOT and JIT compilation for years, and the JIT is almost always faster. It is really hard to predict where to optimize, devirtualize and inline if you don’t know the runtime behavior. (Note AOT != ngen, I am talking about the AOT developed to Microsoft’s mobile efforts)
Static compilers can do PGO as well. In practice though, the benefits of PGO in languages like C or C++ that rely on static dispatch are small enough that most people don't care, except for programs that are extremely performance sensitive (compilers, browsers, AAA games etc) where shaving off a few % can make a difference.
What JVM wins in PGO, it loses in other areas, and the end result is often still much slower.
But if my understanding is correct, PGOs don’t help much with “speculative optimizations” — eg. Java can almost always elide virtual calls when only one actual class is loaded implementing an interface, reverting back to virtual calls on class load.
No matter PGO these and similar examples can’t be done without self-modifying code, while java with JIT is sort of that in a way.
You are wrong. The JVM can do runtime optimisations that you won't get from ahead of time compiled code. Trivial example: removal of runtime constants. That doesn't mean that code running in the JVM is always faster - most of the time it won't be. But it is possible for particular workloads to run faster than in C
What if the runtime for language X (written in Y) rewrites X programs into the same language Z that Y itself compiles to? Yeah, there are up-front costs, so you're still technically correct, but I suspect this isn't something you had in mind.
It is what I had in mind. Again, I'm being pendantic. If language X written in Y can rewrite to superfast Z, then you can... in language Y, write all of language X to mimic results. Sure it's not realistic, but neither are language comparisons.
If you really believed that argument you wouldn't write C, because it's slower than assembly.
Do you actually write C programs that profile themselves and tune their code according to the input data? If not, then the fact that it's theoretically possible to do that seems pretty irrelevant.
Theoretically, your point stands. But practically, it's not a valid conclusion unless you're saying that it's practical to say rewrite all 27M lines of C code in the linux into optimized assembly for multiple cpu architectures. Or even the various x86-64 families.
I'm not sure how valid this comparison is, but PyPy (written in Python) is well known for being a much faster implementation of Python than the main implementation (written in C).
The problem is that different languages do different things at runtime and this is part of the definition of the language. C has almost no runtime, while the JVM is allowed to recompile code at runtime using profiling data it has collected that very run. This is how it can beat C in certain workloads.
This seems a bit wrong-headed. All code ultimately runs CPU instructions, regardless of the language a program is written in. It doesn’t matter if the Java compiler or runtime is written in C, because CPUs don’t run C. The Java bytecode compiler might be written in C, but it ultimately emits CPU instructions.
When talking about performance you say that “there's literally nothing Java can do that C cannot because C can write Java” but other than the tautology that they are both Turing machines, your conclusion doesn’t follow.
The Java runtime dynamically optimises based on the currently executing code, but C compiles to static CPU instructions. A C program is statically optimised so it’s possible that some optimisations in C will perform worse than those in Java because Java knows more about the actual code paths taken at runtime, and this can change between executions.
For C to be able to optimise itself the same way that Java does, it would need its own heavyweight runtime and bytecode, like Java does (as I think you suggest). But C doesn’t actually have such a runtime, and although I’m certainly no expert on the C language spec, I assume it has all sorts of runtime guarantees and a memory model that would preclude arbitrarily rearranging the executable code at runtime, so it arguable that the resulting system wouldn’t actually be “C” if it did this.
Put another way: although Java is/was written in C, the way Java works is fundamentally different to C, and because of this, certain optimisations are available to Java that are not available to C.
(All of that said, in my experience C code is always faster than Java code, these days I use Go and Swift, and I’m very happy to have seen the back of the JVM.)
This is what I used to think about JIT as well. That JIT would have more information than a static compiler to optimise the code better. However, it seems to me that most of the performance depends on how well we can exploit the cache lines and how many fewer instructions the cpu has to execute to get the job done. JIT could perhaps improve on the latter but in many cases it seems to me that it's generally very hard to be cache aware when programming in Java not to mention every user-defined type is allocated on heap. So, I feel like it's quite unlikely that there can be Java programs that outperform C and even then that it can't in principle be done better in C.
Yeah I don’t disagree with you, I was really trying to dispel the idea that somehow the implementation language puts an upper limit on the performance of the implemented language.
So, consider a perfectly spherical Java language… :)
Actually, Java gets ridiculously close to C performance when you only do computations on primitives. Of course you do have some pointer chasing in most code bases but in my experience cache is mostly relevant for repetitive workloads. I’m not sure that a typical web application would win all that much from a C rewrite.
Java is not being rewritten to C though. It is being rewritten to assembly (by a C++ program).
----
Also, by the same argument you're using, git should be faster if written in assembly; since assembly can do more than C can.
The tradeoff between performance and easiness to write the program has already been made by writing git in C. The only difference in using Java is a different point in that tradeoff.
I'd also contend that Java gives you much more bang for the buck in that tradeoff than C. Java is stupidly easy to write, especially if you want to do multithreading. With C you'd have to start writing some form of primitive GC, or have the disciplined paranoia of Rust's borrow checker yourself.
If I write an assembler in C, I could use it to write object files with fancy machine code that the C compiler would never choose. Same with HotSpot, it’s not limited by the code generator it was bootstrapped from.
“Does my C compiler generate better code than HotSpot?” is a different question than “could I conceivably write a better code generator from scratch, in any language?”
Theoretical max performance is different from actual performance - and since all code on a given machine ends up in machine code all languages are the same speed - which is obviously false.
But not all languages emit the same machine code. And I think performance is more sensitive to memory access than machine instructions, so I don't think this point is true.
The point is that a language can be faster than the language it is written in - all or for the most part - depending on how it is used. Saying "the theoretical performance of Java can never be faster than the theoretical performance of C" is a tautology and pretty useless - especially if in practice the Java libraries handle normal operations better than the average C programmer.
Ok I see. I still find that this kind of argument relies on hidden assumptions. By this argument we are saying that Java doesn't actually exist, and it's all C with a fancy DSL. :-)
You could maybe write some kind of translator in C to emit JVM bytecode, and have it recompile itself at runtime. The problem with that line is that it is a lot of work and then to be fair we should allow a similar effort to be applied to a JVM like HotSpot for an apples-to-apples comparison.
Then remember the original thesis, which is that we claimed Java was only faster for specific usecases. That is a very wiggly claim that I bet could nearly always be found true.
It actually really runs slower as is it first an interpreter, and the compiler is yet to be optimized to GCC levels; but is has a great compiler infrastructure that could eventually JIT to faster code (it being a runtime compilation, that has statistics, and Graal being capable of partial evaluation)
I was merely pointing at a hole in your argument: the language of the compiler has absolutely no influence in the quality of it's output (again: the JVM compiles to assembly, not to C++).
A compiler written in brainfuck could write code that is more performant than gcc does.
The language does have an influence on compilation speed, though. But that matter in your argument?
That's not the statement you made. A safer statement would be, "a simple interpreter for language x, implemented in language y, cannot be faster than language y".
My statement is that language speed comparisons are stupid, but if you're willing to make one, you cannot beat the language you're written in. That's like saying Shakespeare writes better stories than...English.
Even if that position turns out to be correct, the argument is deeply flawed. It hinges on a "by definition" where the is no definition. That hand-wave is unsound. Java/The JVM has access to runtime performance information about how code is used that a compiler doesn't have access to. There is no reason it can't outperform C. More information available might lead to better results, and sometimes does.
> Did you also implement object pooling for the Java variant (commonly used in high perf apps)?
In the specific case I don't think you need to; I've seen generated code (from java sources) simply reuse an object in a tight loop. IOW, it doesn't allocate new memory for the instance within a loop, for each invocation of the loop. The memory for the instance is allocated once and then reused.
(For a small allocation (a small instance) I would expect a smart compiler to not allocate anything and simply create the for-loop instance on the stack).
The optimization you are getting at has not much to do with object size, but subsequent usage. If the object reference escapes, it has to be allocated on the heap. Value semantics could/will help here.
> The optimization you are getting at has not much to do with object size, but subsequent usage.
Size plays a part: it determines whether or not an instance first gets allocated on the heap or the stack[1]. Heap allocation gets expensive in a tight loop.
> If the object reference escapes, it has to be allocated on the heap. Value semantics could/will help here.
The assumption is that we are talking about local-only data objects (not returned or outliving the scope). Forgive (and correct) me if I am under the incorrect assumption.
[1] I'd expect a smart compiler to do this: a data object that requires 1MB should at no point be on the stack, while a data object that requires 32 bytes has no business starting the allocator, causing a context switch to the kernel that faults a new page. The specific thresholds are dependent on the runtime and OS support.
For these sorts of micro-benchmarks you can usually attain interesting results with Java. I once constructed specifically designed to show how the JVM's ability to online and un-inline code could make certain programs much faster than C.
> My argument has always been and still is...a language x written in language y cannot by definition be faster than language y.
While this may be true in theory, in practice it is often not true as usually there are time constraints.
Working in a low level language Y can take much more time & effort than a high level language X. This can mean you can profile and iterate the program design to improve performance much faster in X than in Y. Worse, if you have optimized a program in Y following algorithm A, it would be much harder to switch to optimizing following algorithm B. Because almost always "optimization" makes program logic more complex and removes clarity. Some time so much so that you may not even try an alternative! Such increased complexity can also happen in an HLL program but a modular program can often be rewritten more easily.
A trivial example: C uses strings that are zero terminated, but which we do not know the length, so it is necessary to count the string length when we need it.
If you implement, say, a Pascal compiler in C, it can be faster than C because Pascal keeps the length of the string with the string.
A language that does not permit pointer arithmetic can optimize the memory layout on the fly, based on real world program performance and that may be faster than a language it is written in.
Most of these efforts are relatively niche compared to Lucene because they never quite catch up in terms of features, scale, etc. and because the programmers involved are messing around on the fringes of the problem space instead of coming up with algorithmic breakthroughs, which at this point is the quickest way to make things faster. Well, that and cutting some major corners and pretending it's all the same by running some silly benchmark.
The fallacy here is confusing language, idioms, memory models, frameworks, etc. and assuming it's all set in stone. It isn't. Just because you are using Java does not mean everything has to be garbage collected, for example. Lucene actually uses memory mapped files, byte buffers, etc. for a lot of things. So, it does not actually need to do a lot of garbage collecting. It uses the same kinds of solutions you'd pick when using C. And they perform ballpark as you'd expect them too meaning that unless you improve the algorithms, you are not going to be magically a lot faster. The same is true of hotspot, the JVM's runtime compiler, which is written in C and, surprise, uses a lot of the same kind of trickery used by the fine people working on e.g. LLVM. So, a lot of things people assume must surely be slower just aren't and a lot of the insurmountable bottlenecks that people assume must surely be there always actually have well known ways of being worked around.
Of course there is always room for more optimization. Lucene is well over two decades old now and they still regularly come up with major performance improvements. There's nothing magical about how they do that; just a lot of hard work that goes into it that would be somewhat challenging to improve on just by switching language and compilers.