Hacker News new | past | comments | ask | show | jobs | submit login

The performance of sort being exactly the same as the C++ implementation looked a bit suspicious to me at first. Notable that in the C++ performance test a function pointer is passed to std::sort in a similar way how the C library is used [1].

This is probably a good way to check that the C library doesn't do worse in this case, but it's notable that C++ has potentially faster ways to do the sorting:

1. Pass a function object (even a lambda that just wraps "compare")

2. Don't pass a comparison at all for the "vector<int>" case, as the default works there.

Both of these methods have better chances for inlining the comparison within the sort implementation. For the function pointer case it would require constant propagation and possibly "cloning" to do the same, which is less likely for a complicated algorithm like sort.

[1] https://github.com/glouw/ctl/blob/09f5e0a89d3fc621aff94290c4...

edit: I wonder how ctl's vector sort compares to qsort, I expect it to be faster.




I don't believe this is correct. Passing a function name to a C++ template doesn't mean it will be called by an indirection at runtime. The compare function will certainly be inlined. Wrapping it in a lambda is useless and would at best optimize to the identical code.


If you pass a function name then the corresponding type template parameter deduces to a function pointer type. Then this function pointer is passed all the way down where the comparison happens, and by that point it's unlikely to be inlined.

edit: In this talk this exact scenario is analysed in depth:

https://youtu.be/8nyq8SNUTSc?t=435


It appears you are technically correct. TIL!

I don't think it matters in this benchmark though. Compilers can certainly deduce that it's a compile-time constant because the template is fully instantiated in the same translation unit and only used once so it's not hard for a compiler to propagate it. At least in my tests GCC and Clang seem to propagate it just fine.

I suppose a case where it might matter would be if you call std::sort several times in the same translation unit (or the same binary under LTO) with the same template types but different comparison functions. In that case calls to std::sort with different comparison functions would share an implementation so it couldn't be inlined.

You're right though, it should be fixed. Rather than making it a lambda, the benchmark probably shouldn't pass a compare expression to std::sort at all so it can make its own decisions on how to compare values. For example I expect C++20 implementations of std::sort may provide alternate implementations of partitioning that use the spaceship operator if it exists (at least my implementation will.) This is one of the true advantages C++ has that can't be replicated in C.


These days GCC and clang can inline function pointers in many scenarios even through function calls, but it wasn't the case even a few years ago, and they still fail to inline in many important cases.

Inlining pointers requires doing constant propagation and the compiler can give up if it has to track too much stuff, while in the lambda case, the type of the comparator is always available.

Also lambdas can be removed in the early inliner stage exposing them to more optimization opportunities, while function pointers have to be inlined after constant propagation.

As usual, it is a trade off, so there is no substitute to measuring in your specific scenario.


> At least in my tests GCC and Clang seem to propagate it just fine.

In my tests gcc clones sort and inlines the comparison only in -O3. clang doesn't do it even with -O3, neither with libstdc++ nor libc++.

https://godbolt.org/z/4feeex




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: