Hacker News new | past | comments | ask | show | jobs | submit login

I recently changed how JOE dealt with this. Originally it used Markus Kuhn's wcwidth function (http://www.cl.cam.ac.uk/~mgk25/ucs/wcwidth.c), but I've changed it to use the data in EastAsianWidth.txt: http://sourceforge.net/p/joe-editor/mercurial/ci/default/tre...

JOE uses 4-level radix trees for character classes. These work well because the leaf nodes are highly redundant and can be merged together. The resulting structure is often smaller than a binary tree. Character classes are also used for regular expressions, so there is code to build them on the fly from a list of ranges (it's tricky to do this efficiently).

Anyway, I'm surprised that emoji are not double-wide characters.

JOE is still missing Unicode normalization for string searches.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: