Hi, I don't agree to the end as most of the options are equally applicable to C language except libcxx and others. ThinLTO, inline threshold -- they also can be turned on for C compilation units
It is quite bad if changing the allocator made your program run at half speed. It doesn't matter if the library itself consumed more CPU or caused more wait (guessing that's the case here) due to concurrency or excessive syscalls.