Hacker News new | past | comments | ask | show | jobs | submit login

> See, here is the thing. At the end of the day, hypotheticals are worthless, concrete solutions are all that matters. It isn't anglocentricism that we picked the solution that is actually fleshed out and works over the vague hypothetical solution. It's "get-shit-done"-ism

"If you don't know of a solution yourself, shut up." Similar to "if you can't play guitar as well as <a player>, you don't get to have an opinion".

Admittedly in this context I might as well have thought I had something better to offer, given my original post. But as I've said, I don't. It was more of a historical note. I don't see how, given an alternative history, computers wouldn't favour for example the Russian alphabet.

And while we're at it, you might lecture me on how text/ASCII-centered protocols are superior to a binary format. Because I honesetly don't know.

And the fact that IT is Anglo centric goes way beyond Shannon entropy.

> You are forgetting the pigeonhole principle: http://en.wikipedia.org/wiki/Pigeonhole_principle

Compressing as in something like Huffman encoding. Maybe I was misusing the names.




"If you don't know of a solution yourself, shut up." Similar to "if you can't play guitar as well as <a player>, you don't get to have an opinion".

I'm not telling you to shut up. I am telling you to not act offended that a tangible working solution was chosen over a hypothetical solution. In other words, don't act like the universe is unfair because Paul McCartney is famous for songwriting while you are not, even though you totally could have hypothetically written better songs.

> "I don't see how, given an alternative history, computers wouldn't favour for example the Russian alphabet."

In an alternative universe where CP1251 was picked as the basis of the first block in Unicode instead of ASCII, it would have been for the same reasons that ASCII was picked in this universe.

In that universe, you'd just be complaining that Unicode was Russo-centric.

What reason, in this universe, would there have been to go that route?

> Compressing as in something like Huffman encoding. Maybe I was misusing the names.

http://en.wikipedia.org/wiki/Lossless_compression#Limitation...

Huffman encoding is a method used for lossless compression of particular texts. It does not let you put more than 256 characters into a single byte in a character encoding.

The guys that made JIS X 0212 were not missing something when they made JIS X 0208, a two byte encoding, prior to Unicode.

> And the fact that IT is Anglo centric goes way beyond Shannon entropy.

Okay. Complain about instances where it actually exists, and in discussions where it is actually relevant.


> In that universe, you'd just be complaining that Unicode was Russo-centric.

Yes, obviously.

> It does not let you put more than 256 characters into a single byte in a character encoding.

Which I have never claimed. (EDIT: I think we're talking past each other: my point was that things like Huffman encoding encodes the most frequent data with the lowest amount of bits. I don't know how UTF-8 is implemented, but it seems conceptually similar. There is a reason that I didn't want to get anywhere near the nitty-gritty of this.)

> Okay. Complain about instances where it actually exists, and in discussions where it is actually relevant.

Oh yes, I will complain.


A character coding has an equal distribution of each code point. Each code point is represented once.

"For a set of symbols with a uniform probability distribution and a number of members which is a power of two, Huffman coding is equivalent to simple binary block encoding, e.g., ASCII coding."

Huffman encoding something written in Japanese is useful. It is not useful for creating a Japanese character set.

Get it?

If you don't buy it, then try it on pen and paper. Imagine a hypothetical 10-character alphabet, and try to devise an encoding that will let you fit it into a two-bit word, without going multi-word. Use prefix codes or whatever you want.

It's not going to happen. You also aren't going to get 10k characters into an 8-bit/word single-word character set.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: