TLDR glibc’s malloc doesn’t work well for server workloads. Nice deep dive into ...

jeffbee · on Oct 6, 2023

glibc's malloc doesn't work well for any workload. If your workload is anything other than trivial you should be using another one.

> FUD around switching to something like mimalloc or the most recent tcmalloc

Certainly, and the specific FUD about libc++ is unwarranted for tcmalloc, considering the developer of tcmalloc also uses libc++.

spockz · on Oct 6, 2023

So which Malloc should we be using? How about setting that as a default? If apparently the trade off is so cookie cutter clear.

kelnos · on Oct 7, 2023

I assume what the GP actually meant was that glibc's allocator isn't tuned for any specific workloads. It tries to be a jack-of-all-trades, and in doing so, just ends up being okay-ish.

As for what allocator you should be using, it depends on your workload. You'd need to test out some different options to see what works best. And "best" might also be something you have to define. Maybe you're ok with higher baseline memory usage if allocations are faster. Or maybe you want the lower memory usage.

vlovich123 · on Oct 9, 2023

Actually, mimalloc and tcmalloc both show that across almost any workload they outperform glibc. It’s difficult to outperform everywhere of course, but in terms of “works best without knowing the workload a priori”, tcmalloc and mimalloc should handily trounce glibc. Know if you know your workload and start tuning glibc, maybe it can outperform? Not sure but I’d also be skeptical. Tcmalloc and mimalloc both apply really advanced techniques that also require newer kernel support to implement user side RCU iirc. Glibc’s allocator by comparison is much longer in the tooth and just can’t compete on that.

Most workloads don’t care of course, but glibc’s penchant for hanging onto RAM when not needed is one very user visible consequence of this.

vlovich123 · on Oct 7, 2023

Tcmalloc (not the gperftools variant but the new one) and mimalloc would be the 2 I would try first.

I am honestly surprised no one has bothered replacing the system default for distros as a better default allocator would free up RAM since both tcmalloc and mimalloc do a good job of knowing when to release memory back to the OS (not to mention that they’re generally faster allocators anyway).

jeffbee · on Oct 8, 2023

Well, tcmalloc can’t be dynamically loaded, so it can’t be your system allocator. But mimalloc would be worth testing in that role.

vlovich123 · on Oct 8, 2023

Why is that the requirement? Eg if glibc decided to replace its own allocator with tcmalloc, what would be the blocker?

jeffbee · on Oct 6, 2023

If you have a non-trivial workload (i.e. one that costs money) you should shop around and use the one that is the most efficient in your application.

hinkley · on Oct 6, 2023

In early days Java’s allocator was quite a bit faster than C++ in part because memory allocation was not fully concurrent, making allocation part of the sequential element of Amdahl’s law.

Where we are now is better, but has its own problems.

masklinn · on Oct 6, 2023

> Where we are now is better

Not really. Lots of rust users come from high level languages so when people come in and complain about rust being unexpectedly slow allocation issues are in the top 5 at least. It’s incredible how shit the standard allocators are in almost all systems.

saagarjha · on Oct 7, 2023

They’re not. They are supposed to be general-purpose allocators with requirements that may not match your workload.

masklinn · on Oct 7, 2023

> They’re not.

Are too

> They are supposed to be general-purpose allocators with requirements that may not match your workload.

I don’t think they match any workload which exists anywhere anymore. Possibly their workload existed back in the 90s.

Jemalloc is also a general-purpose allocator, it‘s the one freebsd uses.

athanagor2 · on Oct 6, 2023

Isn’t this inevitable if the user allocates each time a struct/array is created, rather than allocating upfront?

masklinn · on Oct 6, 2023

Meh. Users not used to tracking allocations keep to the same regime (lots of allocations), but because the platform allocators are utterly terrible the allocation overhead is orders of magnitude higher than in even a relatively basic runtime, thus the program is dog slow.

This is not inevitable, platforms could provide allocators which are less awful. Obviously they can’t be as fast as specialised runtime facilities but when you see the gains many applications get by just swapping in jemalloc or some such…

forrestthewoods · on Oct 7, 2023

> the platform allocators are utterly terrible the allocation overhead is orders of magnitude higher than in even a relatively basic runtime, thus the program is dog slow

Huh. I’ve seen lots of people write slow Rust code because they didn’t realize there were allocations. But the complaint is usually “my rust program is only 5x faster than my equivalent python program instead of 100x how come”?

Can you point out some good blog posts about Rust or C++ being slower than a “basic runtime” due to inferior allocator?

PH95VuimJjqBqy · on Oct 6, 2023

I would simplify it and say it's an issue with developers who don't understand manual memory management (allocating memory pools up front is just 1 such strategy for manual memory management).

masklinn · on Oct 7, 2023

That’s not a simplification, that’s a misattribution.

It’s not the beginner’s fault that most system allocators suck so hard you must do your utmost to limit their use.

PH95VuimJjqBqy · on Oct 9, 2023

This is akin to the idea that a compiler will eventually solve performance problems.

We're still waiting on that to happen.

There's a difference between a developer who understands manual memory management choosing to use a tech that does it for you and a developer choosing to use a tech that does it for you so they don't have to learn manual memory management.

manual memory management is table stakes for any halfway competent developer.

hinkley · on Oct 7, 2023

Otherwise known as Victim Blaming. C developers are that old guy who white knuckles everything and calls people weak for wanting help.

Many of the features in modern programs didn’t exist in the 90’s not because they hadn’t been thought of but because people spent all their energy getting the first fifty features to work reliably. Every app had a couple cool features. Most of them are de rigeur now because we can.

hinkley · on Oct 7, 2023

If rust isn’t doing escape analysis for stack vs heap allocation then what is even the fucking point of this language? I would have thought that was the first thing implemented.

vlovich123 · on Oct 7, 2023

Rust doesn’t allocate anything on the heap unless you tell it to. When you tell it too, it puts it on the heap. The target use case is as a systems language.

Escape analysis as you’re alluding to isn’t needed in this model because the amount of times this helps you (ie you put it on the heap but the compiler can figure out it can live on the stack) is about 0. You need escape analysis in managed memory languages where everything is nominally a heap allocation and the compiler is responsible for clawing back performance through escape analysis.

wongarsu · on Oct 6, 2023

Rust used to use jemalloc, which was arguably the better default. For some reason they switched it to the system allocator when they added support for specifying the global allocator

steveklabnik · on Oct 7, 2023

jemalloc was not the default on every platform, but was on some. It adds a non-trivial amount of binary size to every program, even ones that don't need the performance boost, and that is an area Rust is often criticized vs C.

That said, there are a few reasons beyond just that. For the primary sources: https://github.com/rust-lang/rust/issues/36963