That example proves the point. If UTF-8 exists then ASCII will exist. It could h...

rusk · on April 11, 2021

UTF-8 is backwards compatible with ASCII “as she is spake” but not strictly speaking ASCII as any ASCII control characters will break UTF-8. It also breaks any 8-but extensions/code pages. ASCII vs HTML is a bad example though because HTML is used globally and although ASCII is too this is more a historical artefact. It’s not hard to imagine ASCII dying out over the next few years while HTML continues to adapt to every encoding under the sun and pure ASCII becomes used less and less ...

kevin_thibedeau · on April 11, 2021

The C1 block isn't ASCII. UTF8 is a perfect superset of 7-bit ASCII.

rusk · on April 12, 2021

Nope. If you read an ASCII file with control characters in Java you’ll get an exception. Also it won’t work with the 8-bit ASCII variants. Neither are “true Scotsmen” of course, but the point still stands that HTML could yet be more durable.

chubot · on April 12, 2021

I think you’re confusing Unicode and utf8. java uses Unicode but not utf8; it uses a 2 byte encoding with surrogate pairs by default.

ASCII is utf8, but it’s not utf16. ASCII will be around for as long as utf8 is.

rusk · on April 12, 2021

I’m almost certain the default encoding for reading/writing files in Java is UTF-8 and similarly for the source files. I don’t think I encounter wide char data much really at all day to day ...