For those interested in Chinese-language computing, there's a fascinating rabbit...

amanwithnoplan · on Sept 7, 2023

Chinese characters being unsuitable for QWERTY typewriters and early computers' limitations didn't "almost kill character-based Chinese", it just meant Chinese computing lagged behind. Inputing Chinese characters with a limited key set wasn't even conceptually novel, given the long history of structural decomposition and phonetic representation of Chinese characters, but it just couldn't be implemented before computers were powerful enough.

TMWNN · on Sept 7, 2023

It is a fact that Mao supported romanization. <https://np.reddit.com/r/todayilearned/comments/54ky8m/til_th...> Presumably, had the PRC switched decades ago, no doubt Taiwan would today cite its sticking with the traditional written language as yet more proof of it being the "real" China, while the PRC would no doubt point to the millions of illiterates who were taught to read in the Western alphabet as proof of the superiority of its approach. (And vice versa, of course, had Taiwan been the one that chose to romanize.)

ethbr1 · on Sept 7, 2023

> it just couldn't be implemented before computers were powerful enough.

Wubi was presented by Wang at the 38th UN General Assembly in 1984.

So while they missed a few years, that's not bad.

waveBidder · on Sept 7, 2023

if like me you want to see the typewriter, here's [0] a short blub with it in use. seems like it'd be quite difficult to use. fewer parts to break down than a standard typewriter though, surprisingly.

0: https://www.bl.uk/history-of-writing/articles/the-double-pig...

routerl · on Sept 7, 2023

While interesting, that podcast episode is just a watered down version of Tom Mullaney's work. Here's the man himself covering this subject in more depth and with more seriousness:

https://m.youtube.com/watch?v=KSEoHLnIXYk&t=310s&pp=ygUSY2hp...

jedberg · on Sept 7, 2023

When you learn Chinese, they teach you that the more complex words are just compounds of smaller words put into a single character. I imagine tokenizing it isn't much different than other languages -- for the complex characters you can tokenize it into its smaller parts.