Hacker News new | past | comments | ask | show | jobs | submit login

I agree completely. Back in the 8086 days, Boyer Moore string search delivered significant speed up over linear memory scan, but nowadays nothing practically beats SIMD linear scans except in very specialized situations.



Indeed. Many Boyer Moore implementations implement the skip loop by using memchr. But in practice, it's often faster to forgo Boyer Moore completely and heuristically select the rarest byte in the string you're searching, and then run memchr on that byte. You can even pick the second rarest byte and use that as a quick way to throw out false positives before doing a full string comparison.

(The aforementioned optimization is one of the tricks ripgrep uses that occasionally causes it to be faster than GNU grep. GNU grep uses a traditional Boyer Moore substring search with memchr always applied using the last byte in the pattern.)




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: