TLDR glibc’s malloc doesn’t work well for server workloads. Nice deep dive into the available options to trim it, but the FUD around switching to something like mimalloc or the most recent tcmalloc is unwarranted considering both have seen large scale deployments and have same defaults out of the gate that don’t need tuning (and any tuning you do for glibc would need similar validation requirements).
I assume what the GP actually meant was that glibc's allocator isn't tuned for any specific workloads. It tries to be a jack-of-all-trades, and in doing so, just ends up being okay-ish.
As for what allocator you should be using, it depends on your workload. You'd need to test out some different options to see what works best. And "best" might also be something you have to define. Maybe you're ok with higher baseline memory usage if allocations are faster. Or maybe you want the lower memory usage.
Actually, mimalloc and tcmalloc both show that across almost any workload they outperform glibc. It’s difficult to outperform everywhere of course, but in terms of “works best without knowing the workload a priori”, tcmalloc and mimalloc should handily trounce glibc. Know if you know your workload and start tuning glibc, maybe it can outperform? Not sure but I’d also be skeptical. Tcmalloc and mimalloc both apply really advanced techniques that also require newer kernel support to implement user side RCU iirc. Glibc’s allocator by comparison is much longer in the tooth and just can’t compete on that.
Most workloads don’t care of course, but glibc’s penchant for hanging onto RAM when not needed is one very user visible consequence of this.
Tcmalloc (not the gperftools variant but the new one) and mimalloc would be the 2 I would try first.
I am honestly surprised no one has bothered replacing the system default for distros as a better default allocator would free up RAM since both tcmalloc and mimalloc do a good job of knowing when to release memory back to the OS (not to mention that they’re generally faster allocators anyway).
In early days Java’s allocator was quite a bit faster than C++ in part because memory allocation was not fully concurrent, making allocation part of the sequential element of Amdahl’s law.
Where we are now is better, but has its own problems.
Not really. Lots of rust users come from high level languages so when people come in and complain about rust being unexpectedly slow allocation issues are in the top 5 at least. It’s incredible how shit the standard allocators are in almost all systems.
Meh. Users not used to tracking allocations keep to the same regime (lots of allocations), but because the platform allocators are utterly terrible the allocation overhead is orders of magnitude higher than in even a relatively basic runtime, thus the program is dog slow.
This is not inevitable, platforms could provide allocators which are less awful. Obviously they can’t be as fast as specialised runtime facilities but when you see the gains many applications get by just swapping in jemalloc or some such…
> the platform allocators are utterly terrible the allocation overhead is orders of magnitude higher than in even a relatively basic runtime, thus the program is dog slow
Huh. I’ve seen lots of people write slow Rust code because they didn’t realize there were allocations. But the complaint is usually “my rust program is only 5x faster than my equivalent python program instead of 100x how come”?
Can you point out some good blog posts about Rust or C++ being slower than a “basic runtime” due to inferior allocator?
I would simplify it and say it's an issue with developers who don't understand manual memory management (allocating memory pools up front is just 1 such strategy for manual memory management).
This is akin to the idea that a compiler will eventually solve performance problems.
We're still waiting on that to happen.
There's a difference between a developer who understands manual memory management choosing to use a tech that does it for you and a developer choosing to use a tech that does it for you so they don't have to learn manual memory management.
manual memory management is table stakes for any halfway competent developer.
Otherwise known as Victim Blaming. C developers are that old guy who white knuckles everything and calls people weak for wanting help.
Many of the features in modern programs didn’t exist in the 90’s not because they hadn’t been thought of but because people spent all their energy getting the first fifty features to work reliably. Every app had a couple cool features. Most of them are de rigeur now because we can.
If rust isn’t doing escape analysis for stack vs heap allocation then what is even the fucking point of this language? I would have thought that was the first thing implemented.
Rust doesn’t allocate anything on the heap unless you tell it to. When you tell it too, it puts it on the heap. The target use case is as a systems language.
Escape analysis as you’re alluding to isn’t needed in this model because the amount of times this helps you (ie you put it on the heap but the compiler can figure out it can live on the stack) is about 0. You need escape analysis in managed memory languages where everything is nominally a heap allocation and the compiler is responsible for clawing back performance through escape analysis.
Rust used to use jemalloc, which was arguably the better default. For some reason they switched it to the system allocator when they added support for specifying the global allocator
jemalloc was not the default on every platform, but was on some. It adds a non-trivial amount of binary size to every program, even ones that don't need the performance boost, and that is an area Rust is often criticized vs C.