Hacker News new | past | comments | ask | show | jobs | submit login

Indeed. Many Boyer Moore implementations implement the skip loop by using memchr. But in practice, it's often faster to forgo Boyer Moore completely and heuristically select the rarest byte in the string you're searching, and then run memchr on that byte. You can even pick the second rarest byte and use that as a quick way to throw out false positives before doing a full string comparison.

(The aforementioned optimization is one of the tricks ripgrep uses that occasionally causes it to be faster than GNU grep. GNU grep uses a traditional Boyer Moore substring search with memchr always applied using the last byte in the pattern.)




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: