After considering this problem in long detail in the past, I too favoured utf8 a...

After considering this problem in long detail in the past, I too favoured utf8 at the time.

I remember a project (circa 1999) I worked on which was a feature phone HTML 3.4 browser and email client (one of the first). The browser/ip stack handled only ascii/code page characters to begin with. To my surprise it was decided to encode text on the platform using utf-16. Thus the entire code base was converted to use 16 bit code points (UCS-2). On a resource constrained platform (~300k ram IFIRC), better, I think, would have been update the renderer and email client to understand utf8.

Nice as it might be to have the idea that utf16, or utf32 were a "character" it is as has been pointed out not the case, and when you look into language you can see how it never can be that simple.