Hacker News new | past | comments | ask | show | jobs | submit login

Besides just timing it (excellent!) did you take a look at what it compiles to?

https://godbolt.org/z/C0ZeE5

It's faster because both GCC and Clang now optimize loops with n&(n-1) to use AVX2 SIMD! I haven't looked closely to confirm, but I think they may in fact even do Harley-Seal for "naive popcount" loops.

C is no longer portable assembly. If you want to test whether a particular algorithm is faster than another, you probably need to write assembly --- or at least confirm that the compiler did what you thought it did.




I think I'll stick with paying people like you to deal with these sorts of issues in my code. ;)




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: