It's such a simple and elegant as data structure. If you only care about word lo...

baur · on Nov 4, 2021

Nice implementation!

There is an option to get all suffixes without traversing subtree, but it comes with extra O(N) memory where N is combined length of all stored words - depending on case might be acceptable since memory for storing words itself is O(N) anyway. https://stackoverflow.com/a/29966616/2104560 (update 1 and update 3)

vanderZwan · on Nov 4, 2021

Thanks! I just realized it doesn't work with strings containing underscores, but simply using __ instead of _ (so double underscores) as the key to end-of-word markings fixes that.

And thanks for the link, that is an interesting optimization!

EDIT: one fun non-practical application (histogramming the letters in a word is simpler and faster) is an anagram finder using prime numbers:

https://observablehq.com/@jobleonard/finding-anagrams-using-...

baur · on Nov 4, 2021

That's a nice idea. I guess it's better to stay in primitive type range (to avoid long arithmetics), so we can "compress" up to 15 items into a single prime_product making it less than 2^64 - for English words should be just fine, I don't expect many words with 16 and more letters.

UPD: Sorry, "up to 15" is a wrong phrasing. I checked once how "prime factorial" fits into primitive, and first 15 primes can fit into long. So it's possible to handle even more symbols if it's smth like "aaaaaaa"64 times because it would be just 2^64