The performance of sort being exactly the same as the C++ implementation looked ...

ludocode · on Dec 30, 2020

I don't believe this is correct. Passing a function name to a C++ template doesn't mean it will be called by an indirection at runtime. The compare function will certainly be inlined. Wrapping it in a lambda is useless and would at best optimize to the identical code.

leni536 · on Dec 30, 2020

If you pass a function name then the corresponding type template parameter deduces to a function pointer type. Then this function pointer is passed all the way down where the comparison happens, and by that point it's unlikely to be inlined.

edit: In this talk this exact scenario is analysed in depth:

https://youtu.be/8nyq8SNUTSc?t=435

ludocode · on Dec 30, 2020

It appears you are technically correct. TIL!

I don't think it matters in this benchmark though. Compilers can certainly deduce that it's a compile-time constant because the template is fully instantiated in the same translation unit and only used once so it's not hard for a compiler to propagate it. At least in my tests GCC and Clang seem to propagate it just fine.

I suppose a case where it might matter would be if you call std::sort several times in the same translation unit (or the same binary under LTO) with the same template types but different comparison functions. In that case calls to std::sort with different comparison functions would share an implementation so it couldn't be inlined.

You're right though, it should be fixed. Rather than making it a lambda, the benchmark probably shouldn't pass a compare expression to std::sort at all so it can make its own decisions on how to compare values. For example I expect C++20 implementations of std::sort may provide alternate implementations of partitioning that use the spaceship operator if it exists (at least my implementation will.) This is one of the true advantages C++ has that can't be replicated in C.

gpderetta · on Dec 30, 2020

These days GCC and clang can inline function pointers in many scenarios even through function calls, but it wasn't the case even a few years ago, and they still fail to inline in many important cases.

Inlining pointers requires doing constant propagation and the compiler can give up if it has to track too much stuff, while in the lambda case, the type of the comparator is always available.

Also lambdas can be removed in the early inliner stage exposing them to more optimization opportunities, while function pointers have to be inlined after constant propagation.

As usual, it is a trade off, so there is no substitute to measuring in your specific scenario.

leni536 · on Dec 30, 2020

> At least in my tests GCC and Clang seem to propagate it just fine.

In my tests gcc clones sort and inlines the comparison only in -O3. clang doesn't do it even with -O3, neither with libstdc++ nor libc++.

https://godbolt.org/z/4feeex