Hacker News new | past | comments | ask | show | jobs | submit login

There is actually such an instruction, psadbw (_mm_sad_epu8). It's not any noticeably faster for the scenario you describe since it only affects the outer loop, but it does avoid needing the popcnt instruction since psadbw only requires SSE2.



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: