Hacker News new | past | comments | ask | show | jobs | submit login

Regular UTF8, not WTF-8 or any of those other variants (which are for encoding data that is not necessarily Unicode).



Also excluding Unicode normalization? Or should that also be baked in?


No need to drag Unicode normalization into it; don't require strings to be normalized. Normalization is only relevant in very specific contexts and you don't want to pay for it elsewhere.


Agreed, but I think that many people would consider Unicode normalization to be part of what they want from the std lib when they mean that UTF8 should be baked in... so that they can manipulate UTF8 as they want, including in various normal forms according to platform. It's hard to imagine people being satisfied without having access to Unicode normalization.

For example, consider JS' introduction of String.normalize(). This is a slippery slope. It had a huge impact on Node's build process and binary sizes because now all the tables had to be shipped. But it's still broken in JS, because no matter the Unicode normalization support provided, it will never match the exact tables used e.g. in Apple's HFS.

I feel that by the time it gets to String.normalize(), it's too far gone.




Consider applying for YC's W25 batch! Applications are open till Nov 12.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: