> See, here is the thing. At the end of the day, hypotheticals are worthless, co...

Crito · on Jan 16, 2014

"If you don't know of a solution yourself, shut up." Similar to "if you can't play guitar as well as <a player>, you don't get to have an opinion".

I'm not telling you to shut up. I am telling you to not act offended that a tangible working solution was chosen over a hypothetical solution. In other words, don't act like the universe is unfair because Paul McCartney is famous for songwriting while you are not, even though you totally could have hypothetically written better songs.

> "I don't see how, given an alternative history, computers wouldn't favour for example the Russian alphabet."

In an alternative universe where CP1251 was picked as the basis of the first block in Unicode instead of ASCII, it would have been for the same reasons that ASCII was picked in this universe.

In that universe, you'd just be complaining that Unicode was Russo-centric.

What reason, in this universe, would there have been to go that route?

> Compressing as in something like Huffman encoding. Maybe I was misusing the names.

http://en.wikipedia.org/wiki/Lossless_compression#Limitation...

Huffman encoding is a method used for lossless compression of particular texts. It does not let you put more than 256 characters into a single byte in a character encoding.

The guys that made JIS X 0212 were not missing something when they made JIS X 0208, a two byte encoding, prior to Unicode.

> And the fact that IT is Anglo centric goes way beyond Shannon entropy.

Okay. Complain about instances where it actually exists, and in discussions where it is actually relevant.

Dewie · on Jan 16, 2014

> In that universe, you'd just be complaining that Unicode was Russo-centric.

Yes, obviously.

> It does not let you put more than 256 characters into a single byte in a character encoding.

Which I have never claimed. (EDIT: I think we're talking past each other: my point was that things like Huffman encoding encodes the most frequent data with the lowest amount of bits. I don't know how UTF-8 is implemented, but it seems conceptually similar. There is a reason that I didn't want to get anywhere near the nitty-gritty of this.)

> Okay. Complain about instances where it actually exists, and in discussions where it is actually relevant.

Oh yes, I will complain.

Crito · on Jan 16, 2014

A character coding has an equal distribution of each code point. Each code point is represented once.

"For a set of symbols with a uniform probability distribution and a number of members which is a power of two, Huffman coding is equivalent to simple binary block encoding, e.g., ASCII coding."

Huffman encoding something written in Japanese is useful. It is not useful for creating a Japanese character set.

Get it?

If you don't buy it, then try it on pen and paper. Imagine a hypothetical 10-character alphabet, and try to devise an encoding that will let you fit it into a two-bit word, without going multi-word. Use prefix codes or whatever you want.

It's not going to happen. You also aren't going to get 10k characters into an 8-bit/word single-word character set.