JOE uses 4-level radix trees for character classes. These work well because the leaf nodes are highly redundant and can be merged together. The resulting structure is often smaller than a binary tree. Character classes are also used for regular expressions, so there is code to build them on the fly from a list of ranges (it's tricky to do this efficiently).
Anyway, I'm surprised that emoji are not double-wide characters.
JOE is still missing Unicode normalization for string searches.
JOE uses 4-level radix trees for character classes. These work well because the leaf nodes are highly redundant and can be merged together. The resulting structure is often smaller than a binary tree. Character classes are also used for regular expressions, so there is code to build them on the fly from a list of ranges (it's tricky to do this efficiently).
Anyway, I'm surprised that emoji are not double-wide characters.
JOE is still missing Unicode normalization for string searches.