Regular UTF8, not WTF-8 or any of those other variants (which are for encoding d...

jorangreef · on Dec 24, 2021

Also excluding Unicode normalization? Or should that also be baked in?

roca · on Dec 24, 2021

No need to drag Unicode normalization into it; don't require strings to be normalized. Normalization is only relevant in very specific contexts and you don't want to pay for it elsewhere.

jorangreef · on Dec 25, 2021

Agreed, but I think that many people would consider Unicode normalization to be part of what they want from the std lib when they mean that UTF8 should be baked in... so that they can manipulate UTF8 as they want, including in various normal forms according to platform. It's hard to imagine people being satisfied without having access to Unicode normalization.

For example, consider JS' introduction of String.normalize(). This is a slippery slope. It had a huge impact on Node's build process and binary sizes because now all the tables had to be shipped. But it's still broken in JS, because no matter the Unicode normalization support provided, it will never match the exact tables used e.g. in Apple's HFS.

I feel that by the time it gets to String.normalize(), it's too far gone.