The narrative that Han unification is this thing imposed by ignorant westerners on East Asian computer users is simply not true. The criteria for which characters were unified was mostly based on the criteria used in legacy East Asian encodings, which already had to deal with the question of what counts as the same character and what does not.
Unicode has round trip compatibility with the old encodings, too, without the use of any of that extra metadata, which is only used for incredibly minor character variations which are pretty much never semantically meaningful the way capitalization is. A human copyist might change one for another in copying a text by hand, just as when copying English, you do not consider it semantically meaningful whether lowercase “a” is drawn as a circle and a line, or a circle and a hook.
Is there a longer-form piece that would be helpful for me to read giving more of this perspective and showing receipts on it? As someone with little knowledge outside the Wikipedia article, it seems to me most damning that Shift JIS is still so popular in Japan.
What I will point out though, that you can easily verify with a simple search, is that Shift-JIS can be represented losslessly in Unicode, and that this has been true for as long as Unicode CJK has existed.
The continued popularity of Shift-JIS is worth noting, but it is also important to note that its continued use is not a stable thing, and has been declining for two decades. The most popular websites in Japan no longer use it, and among smaller websites, the percentage that use it gets smaller each year. Secondly, there is absolutely no benefit to the user in terms of that can be encoded and distinguished, because it will get converted into Unicode at some stage in the pipeline anyway. Any modern text renderer used in a GUI toolkit will use Unicode internally. “Weird byte sequences to indicate Unicode characters” is really all that legacy encodings are from the perspective of new software. They are essentially just incompatible ways of specifying a subset of Unicode characters.
As for why it has held on so long, there are a few reasons. The Japanese tech world can be quite conservative and skeptical about new things. That is one factor. But I think another is that Japan was really a computing pioneer in the 1980s, and local standards ruled supreme. Compatibility was not a big concern, and even the mighty IBM PC and its clones barely made an impact there for a long time, as it was completely eclipsed by Japanese alternatives. Now, everyone is forced by our increasingly interconnected world to work on international standards, and I can’t help but feel that there is some resentment at not being about to just “do their own thing” anymore. Every time a new encoding extension is proposed, they have to present it to an organization that includes China, Korea, Taiwan, and Vietnam, who will scrutinize it. A few years ago JIS (the Japanese standards organization) actually proposed that each East Asian country should just get their own blocks in Unicode and they should be able to encode whatever they want with no input from others. Of course, none of the other East Asian countries took their proposal seriously. I wish I could find the proposal, because I hate saying stuff like this without sources, but I can tell you that it is buried somewhere among all of the proposal documents that you can find on the website of the Ideographic Research Group, which is the organization that East Asian countries participate in which is responsible for CJK encoding in Unicode. You might find it here with enough scrolling. I have to get on the subway though, so I have to end this comment here.
It doesn’t matter. If C is supposed to be a language for solving real-world problems, its strings have to be real-world strings. Wishing writing were more elegant gets you nowhere.
There is a lot more to Rust than just “safety”. That gets talked about a lot, but Rust is full of great features that make it worthwhile to use even when you are working in unsafe Rust.
On top of that, it should still be possible to write the application logic in safe Rust, even if you need to use unsafe for FFI with Windows stuff.
That is what happens when you have a page which is genuinely tiny even for the early days of the web, served over a modern internet connection, and it is cached so it does not have to go all the way to the server. The web could be so much faster than it is.
I actually added a link to the page to bypass CloudFlare entirely. I wanted the page to remain accessible so people could see what was on it regardless, but CloudFlare helps with that, but I also know that setting the page load slowly from the real machine is also an experience that many people are there for, so I tried to find a way to have my cake and eat it too.
Oh wow, I know you from the README! It really is something seeing you here. It is not a very impressive site at the moment but the immense amount of traffic that I have been getting is encouraging me to grow it into something with some actual content. Probably something retro related in some way.
Yeah, I find the slow loading really fun. Unfortunately, when I install the Ethernet card I imagine it will get a fair bit faster. But on the bright side, I think it will reduce the need for CloudFlare, which helps in terms of authenticity.
I don’t know too much about how CloudFlare works, but it looks like you got a direct connection! Congratulations! My guess is that your DNS does not have the CloudFlare entries yet.
Unicode has round trip compatibility with the old encodings, too, without the use of any of that extra metadata, which is only used for incredibly minor character variations which are pretty much never semantically meaningful the way capitalization is. A human copyist might change one for another in copying a text by hand, just as when copying English, you do not consider it semantically meaningful whether lowercase “a” is drawn as a circle and a line, or a circle and a hook.