Hacker News new | past | comments | ask | show | jobs | submit login

We already did! That's what happened when UTF-16 was exhausted, which was never the original plan. Just like how the IPv4 internet degraded into a mess of hacks once addresses ran short (like NAT), so too did Unicode start becoming wildly more complex.

Amongst other things, hitting the limit of 16 bits meant the introduction of:

- The concept of "planes"

- UTF-16 combining characters

- UTF-32

- The newfound desire to encode emoji using combining characters, which means many apparently simple emoji are actually hacked together out of a mini programming language (e.g. black man = man emoji + skin tone modifier). Same thing for flags, which are actually two English letters mapped into a different part of the code space and then combined e.g. the British flag is G+B.

It's one reason why emoji broke so much software. It used to be that before emoji nobody cared about characters beyond the basic multilingual plane and ignored them. Then emoji came along and broke everything that assumed a UTF-16 code point == a character.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: