The initial length + unique character test allows us to negative match in a way ...

gravypod · on May 15, 2018

Could you get good-enough performance with something simpler and more portable?

The optimized assembly is much more complicated and specialized. Porting platforms/architectures is very difficult and you loose out on the compiler's abilities to make assumptions and optimize your code.

String length is a very poor probabilistic filter as you're specifically looking substrings.

Also if you already have a hash of your input string's prefix, and the hash of all of your known strings, in the case where you actually need to check against your "known" strings becomes a fast search of a set of numbers making the fail-through case of your code negligible. If you impose a limit of strings in your "known" pool the compiler can have a field day.

trentnelson · on May 15, 2018

> Could you get good-enough performance with something simpler and more portable?

Hmmm. Did you read the article? Any of the C versions are relatively simple and portable. They basically work the same as the assembly ones and give great performance... I'm not sure what you're trying to get at.

> The optimized assembly is much more complicated and specialized.

Well, it's my article, I wanted to see if I could beat a PGO C compiler with all its optimizations turned on :-)

> Porting platforms/architectures is very difficult and you loose out on the compiler's abilities to make assumptions and optimize your code.

I know the trade-offs that come with assembly. But if you're okay with them, you can get the best possible performance out of the underlying hardware. That was the goal of this article.

> String length is a very poor probabilistic filter as you're specifically looking substrings.

I don't understand this comment. String length is a great filter because if any strings in the table are longer than the input search string, then there's no way we can have a prefix match.

> Also if you already have a hash of your input string's prefix, and the hash of all of your known strings, in the case where you actually need to check against your "known" strings becomes a fast search of a set of numbers making the fail-through case of your code negligible. If you impose a limit of strings in your "known" pool the compiler can have a field day.

I don't know where this fascination with hashes comes from... have you looked at the implementation? Even when it has to do a string match, the timings now are around 13-14 cycles. The nature of the data structure means you're really only ever doing max 2 comparisons... 3 if someone is sending you worst case data (and even so, with 3, you're only clocking ~20 cycles).

Hashing and prefix matching just don't play well nicely together.