Hacker News new | past | comments | ask | show | jobs | submit login

I thought popcnt has a bug? I am just going off of memory of reading some very indepth blog posts on how it slowed down run times rather than sped them up. I could have forgotten though. Before I even finished this post I just went and found it. Looks like it might be only for haswell and sandy/ivy bridge http://stackoverflow.com/questions/25078285/replacing-a-32-b...



There is a bug that significantly reduces performance if certain register combinations are used, but the current generation of compilers have been patched to avoid triggering it. So it's something you should be on the lookout for, but if you are compiling your own code (or working directly in assembly) it's not a reason to avoid POPCNT.

If you are distributing high-performance code that might be compiled with unpatched compilers, you probably should take steps to mitigate. In this case using a snippet of inline assembly where you have control of the registers is probably the best solution, since even if you avoid explicitly using __builtin_popcount() the compiler might "optimize" your algorithm to use it anyway.


My own code is in assembly. Plus, when that came out I tested the different alternatives, and found that my hardware didn't have the problem. Perhaps it was too old.

I also did extensive comparisons between the POPCNT version and the nibble-based version using PSHUFB, and showed that the POPCNT version was always faster.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: