The narrative that Han unification is this thing imposed by ignorant westerners ...

mikepurvis · on June 7, 2023

Is there a longer-form piece that would be helpful for me to read giving more of this perspective and showing receipts on it? As someone with little knowledge outside the Wikipedia article, it seems to me most damning that Shift JIS is still so popular in Japan.

kps · on June 7, 2023

Unicode 1.0 Volume 2 Chapter 2 — https://www.unicode.org/versions/Unicode1.0.0/V2ch02.pdf

serentty · on June 7, 2023

I wish I could think of a good long form article about this. The best I can immediately think of is a pretty informative FAQ about Japanese encoding.

https://www.sljfaq.org/afaq/encodings.html

What I will point out though, that you can easily verify with a simple search, is that Shift-JIS can be represented losslessly in Unicode, and that this has been true for as long as Unicode CJK has existed.

The continued popularity of Shift-JIS is worth noting, but it is also important to note that its continued use is not a stable thing, and has been declining for two decades. The most popular websites in Japan no longer use it, and among smaller websites, the percentage that use it gets smaller each year. Secondly, there is absolutely no benefit to the user in terms of that can be encoded and distinguished, because it will get converted into Unicode at some stage in the pipeline anyway. Any modern text renderer used in a GUI toolkit will use Unicode internally. “Weird byte sequences to indicate Unicode characters” is really all that legacy encodings are from the perspective of new software. They are essentially just incompatible ways of specifying a subset of Unicode characters.

As for why it has held on so long, there are a few reasons. The Japanese tech world can be quite conservative and skeptical about new things. That is one factor. But I think another is that Japan was really a computing pioneer in the 1980s, and local standards ruled supreme. Compatibility was not a big concern, and even the mighty IBM PC and its clones barely made an impact there for a long time, as it was completely eclipsed by Japanese alternatives. Now, everyone is forced by our increasingly interconnected world to work on international standards, and I can’t help but feel that there is some resentment at not being about to just “do their own thing” anymore. Every time a new encoding extension is proposed, they have to present it to an organization that includes China, Korea, Taiwan, and Vietnam, who will scrutinize it. A few years ago JIS (the Japanese standards organization) actually proposed that each East Asian country should just get their own blocks in Unicode and they should be able to encode whatever they want with no input from others. Of course, none of the other East Asian countries took their proposal seriously. I wish I could find the proposal, because I hate saying stuff like this without sources, but I can tell you that it is buried somewhere among all of the proposal documents that you can find on the website of the Ideographic Research Group, which is the organization that East Asian countries participate in which is responsible for CJK encoding in Unicode. You might find it here with enough scrolling. I have to get on the subway though, so I have to end this comment here.

https://appsrv.cse.cuhk.edu.hk/~irg/