Hacker News new | past | comments | ask | show | jobs | submit login

I guess you're talking about MSVC. I find it hard to care much what that does. Code where performance matters much is not built with it.

But you're right, their std lib not using their own intrinsic is pathetic.




Sometimes we don't have a choice of toolchain used due to distribution targets or other dependencies :-( I'd prefer to be using GCC / Clang for everything too...


I wrote to a maintainer of the MSVC lib. He says their lib has to work on all amd64s, but some (specifically, AMD before K10, and Intel before SSSE3) have no popcount. He says their intrinsics are defined to emit exactly the instruction named, unlike Gcc's, so they can't use that in their library.

No explanation why they use the loop form, except that the code hasn't been touched in a long, long time.


Surely they can check if AVX is enabled and use the intrinsic if so?


That would involve changing code not touched since before AVX or even SSSE3 existed. Probably not even since before amd64 existed.

But it's hard to switch on use of a single instruction. Checking at the use site consumes a branch predictor slot. Switching in a function pointer interferes with inlining. Self-modifying code would have been the old way. The modern way might be rewriting in the linker or loader, or JIT compiling.

I have discovered that compilers are extremely bad at recognizing hand-coded byte-order swapping and dropping in movbe or bswap instructions. That Gcc and Clang recognized ham-handed pop counting loops seems miraculous now.




Consider applying for YC's W25 batch! Applications are open till Nov 12.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: