Hacker News new | past | comments | ask | show | jobs | submit login

It's such a simple and elegant as data structure. If you only care about word lookup you can implement it in just 35 lines of modern JavaScript:

    function trieBuilder(word_list) {
      const root = {};
      for (const word of words_list) {
        let node = root;
        for (const char of word) {
          let nextNode = node[char];
          if (!nextNode) node[char] = nextNode = {};
          node = nextNode;
        }
        node._ = 1; // mark the nodes that are endings of real words
      }

      function findChildren(node, prefix, list, maxLength) {
        if (node._ === 1) list.push(prefix);
        for (const char in node) {
          findChildren(node[char], prefix + char, list, maxLength);
          if (list.length >= maxLength) return list;
        }
        return list;
      }

      function findSuffixes(prefix, maxLength) {
        prefix = prefix.toLowerCase();
        let node = root;
        for (const char of prefix) {
          let nextNode = node[char];
          if (!nextNode) return [""];
          node = nextNode;
        }
        let words = findChildren(node, prefix, [], maxLength);
        return words;
      }

      return {root, findSuffixes};
    }

demo: https://observablehq.com/@jobleonard/autocomplete



Nice implementation!

There is an option to get all suffixes without traversing subtree, but it comes with extra O(N) memory where N is combined length of all stored words - depending on case might be acceptable since memory for storing words itself is O(N) anyway. https://stackoverflow.com/a/29966616/2104560 (update 1 and update 3)


Thanks! I just realized it doesn't work with strings containing underscores, but simply using __ instead of _ (so double underscores) as the key to end-of-word markings fixes that.

And thanks for the link, that is an interesting optimization!

EDIT: one fun non-practical application (histogramming the letters in a word is simpler and faster) is an anagram finder using prime numbers:

https://observablehq.com/@jobleonard/finding-anagrams-using-...


That's a nice idea. I guess it's better to stay in primitive type range (to avoid long arithmetics), so we can "compress" up to 15 items into a single prime_product making it less than 2^64 - for English words should be just fine, I don't expect many words with 16 and more letters.

UPD: Sorry, "up to 15" is a wrong phrasing. I checked once how "prime factorial" fits into primitive, and first 15 primes can fit into long. So it's possible to handle even more symbols if it's smth like "aaaaaaa"64 times because it would be just 2^64




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: