I guess you're talking about MSVC. I find it hard to care much what that does. C...

snovv_crash · on Sept 10, 2019

Sometimes we don't have a choice of toolchain used due to distribution targets or other dependencies :-( I'd prefer to be using GCC / Clang for everything too...

ncmncm · on Sept 13, 2019

I wrote to a maintainer of the MSVC lib. He says their lib has to work on all amd64s, but some (specifically, AMD before K10, and Intel before SSSE3) have no popcount. He says their intrinsics are defined to emit exactly the instruction named, unlike Gcc's, so they can't use that in their library.

No explanation why they use the loop form, except that the code hasn't been touched in a long, long time.

snovv_crash · on Sept 15, 2019

Surely they can check if AVX is enabled and use the intrinsic if so?

ncmncm · on Sept 18, 2019

That would involve changing code not touched since before AVX or even SSSE3 existed. Probably not even since before amd64 existed.

But it's hard to switch on use of a single instruction. Checking at the use site consumes a branch predictor slot. Switching in a function pointer interferes with inlining. Self-modifying code would have been the old way. The modern way might be rewriting in the linker or loader, or JIT compiling.

I have discovered that compilers are extremely bad at recognizing hand-coded byte-order swapping and dropping in movbe or bswap instructions. That Gcc and Clang recognized ham-handed pop counting loops seems miraculous now.