Sometimes we don't have a choice of toolchain used due to distribution targets or other dependencies :-( I'd prefer to be using GCC / Clang for everything too...
I wrote to a maintainer of the MSVC lib. He says their lib has to work on all amd64s, but some (specifically, AMD before K10, and Intel before SSSE3) have no popcount. He says their intrinsics are defined to emit exactly the instruction named, unlike Gcc's, so they can't use that in their library.
No explanation why they use the loop form, except that the code hasn't been touched in a long, long time.
That would involve changing code not touched since before AVX or even SSSE3 existed. Probably not even since before amd64 existed.
But it's hard to switch on use of a single instruction. Checking at the use site consumes a branch predictor slot. Switching in a function pointer interferes with inlining. Self-modifying code would have been the old way. The modern way might be rewriting in the linker or loader, or JIT compiling.
I have discovered that compilers are extremely bad at recognizing hand-coded byte-order swapping and dropping in movbe or bswap instructions. That Gcc and Clang recognized ham-handed pop counting loops seems miraculous now.
But you're right, their std lib not using their own intrinsic is pathetic.