Hacker News new | past | comments | ask | show | jobs | submit login

That example proves the point. If UTF-8 exists then ASCII will exist.

It could have gone the other way: if UTF-16 was the ONLY encoding, then ASCII would be obsolete. But that didn't happen.




UTF-8 is backwards compatible with ASCII “as she is spake” but not strictly speaking ASCII as any ASCII control characters will break UTF-8. It also breaks any 8-but extensions/code pages. ASCII vs HTML is a bad example though because HTML is used globally and although ASCII is too this is more a historical artefact. It’s not hard to imagine ASCII dying out over the next few years while HTML continues to adapt to every encoding under the sun and pure ASCII becomes used less and less ...


The C1 block isn't ASCII. UTF8 is a perfect superset of 7-bit ASCII.


Nope. If you read an ASCII file with control characters in Java you’ll get an exception. Also it won’t work with the 8-bit ASCII variants. Neither are “true Scotsmen” of course, but the point still stands that HTML could yet be more durable.


I think you’re confusing Unicode and utf8. java uses Unicode but not utf8; it uses a 2 byte encoding with surrogate pairs by default.

ASCII is utf8, but it’s not utf16. ASCII will be around for as long as utf8 is.


I’m almost certain the default encoding for reading/writing files in Java is UTF-8 and similarly for the source files. I don’t think I encounter wide char data much really at all day to day ...




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: