My text editor, KeenWrite[0], provides the ability to write documents while using variables[1]. Having variable names sprinkled throughout a document requires a fast mechanism to replace all of the variables with their corresponding values. I wrote a benchmark comparing Apache's StringUtils versus the Aho-Corasick algorithm[2], using Robert Bor's Trie implementation[3] that may be illuminating.
The integration uses a factory pattern to choose between Apache or a Trie depending on the length of the source document.[4]
> Wouldn’t it be easier to track of where the to-be-substituted patterns are in the document?
Tracking the location of substrings within a document under active edits is not trivial. Performing string substitutions in a separate thread prior to rendering for preview (or exporting to PDF) is straightforward and has no impact on UI performance.
> Tracking the location of substrings within a document under active edits is not trivial.
Why not? Many word processors do that all the time to track text styles.
You have an array of (start offset, end offset) pairs, optionally with a pointer to expansion info. When the text is edited, say by inserting c characters at offset O, go through the array, and update all offsets ≥ O by c.
When you find a pair where you have to update only one of the offsets, the text of the to-be-expanded variable changed.
Architecturally, variable processing knows nothing about text editing, it's just another step along the chain of document transformations.
> Why not?
It introduces more problems (edge cases) than it solves: empty documents, detection of partially matching variables, variable name at the end of the document, unmatched sigils, sigil insertion detection, sigil deletion, dynamic sigil changes, clipboard paste parsing and sigil detection, additional keyboard hooks, unit testing for off-by-one errors, supporting multiple sigil types within the same document, performance testing, and more. It's not trivial.
The integration uses a factory pattern to choose between Apache or a Trie depending on the length of the source document.[4]
[0]: https://github.com/DaveJarvis/keenwrite
[1]: https://www.youtube.com/watch?v=cjQ-dle-tAE&list=PLB-WIt1cZY...
[2]: https://stackoverflow.com/a/40836618/59087
[3]: https://github.com/robert-bor/aho-corasick/
[4]: https://github.com/DaveJarvis/KeenWrite/tree/main/src/main/j...