More

serentty · on June 7, 2023

The narrative that Han unification is this thing imposed by ignorant westerners on East Asian computer users is simply not true. The criteria for which characters were unified was mostly based on the criteria used in legacy East Asian encodings, which already had to deal with the question of what counts as the same character and what does not.

Unicode has round trip compatibility with the old encodings, too, without the use of any of that extra metadata, which is only used for incredibly minor character variations which are pretty much never semantically meaningful the way capitalization is. A human copyist might change one for another in copying a text by hand, just as when copying English, you do not consider it semantically meaningful whether lowercase “a” is drawn as a circle and a line, or a circle and a hook.

mikepurvis · on June 7, 2023

Is there a longer-form piece that would be helpful for me to read giving more of this perspective and showing receipts on it? As someone with little knowledge outside the Wikipedia article, it seems to me most damning that Shift JIS is still so popular in Japan.

kps · on June 7, 2023

Unicode 1.0 Volume 2 Chapter 2 — https://www.unicode.org/versions/Unicode1.0.0/V2ch02.pdf

serentty · on June 7, 2023

I wish I could think of a good long form article about this. The best I can immediately think of is a pretty informative FAQ about Japanese encoding.

https://www.sljfaq.org/afaq/encodings.html

What I will point out though, that you can easily verify with a simple search, is that Shift-JIS can be represented losslessly in Unicode, and that this has been true for as long as Unicode CJK has existed.

The continued popularity of Shift-JIS is worth noting, but it is also important to note that its continued use is not a stable thing, and has been declining for two decades. The most popular websites in Japan no longer use it, and among smaller websites, the percentage that use it gets smaller each year. Secondly, there is absolutely no benefit to the user in terms of that can be encoded and distinguished, because it will get converted into Unicode at some stage in the pipeline anyway. Any modern text renderer used in a GUI toolkit will use Unicode internally. “Weird byte sequences to indicate Unicode characters” is really all that legacy encodings are from the perspective of new software. They are essentially just incompatible ways of specifying a subset of Unicode characters.

As for why it has held on so long, there are a few reasons. The Japanese tech world can be quite conservative and skeptical about new things. That is one factor. But I think another is that Japan was really a computing pioneer in the 1980s, and local standards ruled supreme. Compatibility was not a big concern, and even the mighty IBM PC and its clones barely made an impact there for a long time, as it was completely eclipsed by Japanese alternatives. Now, everyone is forced by our increasingly interconnected world to work on international standards, and I can’t help but feel that there is some resentment at not being about to just “do their own thing” anymore. Every time a new encoding extension is proposed, they have to present it to an organization that includes China, Korea, Taiwan, and Vietnam, who will scrutinize it. A few years ago JIS (the Japanese standards organization) actually proposed that each East Asian country should just get their own blocks in Unicode and they should be able to encode whatever they want with no input from others. Of course, none of the other East Asian countries took their proposal seriously. I wish I could find the proposal, because I hate saying stuff like this without sources, but I can tell you that it is buried somewhere among all of the proposal documents that you can find on the website of the Ideographic Research Group, which is the organization that East Asian countries participate in which is responsible for CJK encoding in Unicode. You might find it here with enough scrolling. I have to get on the subway though, so I have to end this comment here.

https://appsrv.cse.cuhk.edu.hk/~irg/

serentty · on June 7, 2023

It doesn’t matter. If C is supposed to be a language for solving real-world problems, its strings have to be real-world strings. Wishing writing were more elegant gets you nowhere.

serentty · on June 7, 2023

The inability to represent text in the vast majority of the world’s languages is a far bigger issue than the lack of emoji.

serentty · on May 13, 2023

There is a lot more to Rust than just “safety”. That gets talked about a lot, but Rust is full of great features that make it worthwhile to use even when you are working in unsafe Rust.

On top of that, it should still be possible to write the application logic in safe Rust, even if you need to use unsafe for FFI with Windows stuff.

criddell · on May 13, 2023

Are there any Windows GUI applications written in Rust that are a good place to start for someone who wants to learn more?

serentty · on April 17, 2022

That is what happens when you have a page which is genuinely tiny even for the early days of the web, served over a modern internet connection, and it is cached so it does not have to go all the way to the server. The web could be so much faster than it is.

serentty · on April 17, 2022

I actually added a link to the page to bypass CloudFlare entirely. I wanted the page to remain accessible so people could see what was on it regardless, but CloudFlare helps with that, but I also know that setting the page load slowly from the real machine is also an experience that many people are there for, so I tried to find a way to have my cake and eat it too.

borgchick · on April 17, 2022

ah! nice! sorry about that, I will admit (although I think you already know), that I didn't initially read the contents of your page :)

serentty · on April 17, 2022

Oh wow, I know you from the README! It really is something seeing you here. It is not a very impressive site at the moment but the immense amount of traffic that I have been getting is encouraging me to grow it into something with some actual content. Probably something retro related in some way.

serentty · on April 17, 2022

Yeah, I find the slow loading really fun. Unfortunately, when I install the Ethernet card I imagine it will get a fair bit faster. But on the bright side, I think it will reduce the need for CloudFlare, which helps in terms of authenticity.

serentty · on April 17, 2022

Do they not anymore? That’s horrible. Deliberate reduction of the internet into a platform for consumption instead of bidirectional communication.

tomrod · on April 17, 2022

/me broadly gestures at every well known tech co

serentty · on April 17, 2022

I don’t know too much about how CloudFlare works, but it looks like you got a direct connection! Congratulations! My guess is that your DNS does not have the CloudFlare entries yet.