Hacker News new | past | comments | ask | show | jobs | submit login

A character coding has an equal distribution of each code point. Each code point is represented once.

"For a set of symbols with a uniform probability distribution and a number of members which is a power of two, Huffman coding is equivalent to simple binary block encoding, e.g., ASCII coding."

Huffman encoding something written in Japanese is useful. It is not useful for creating a Japanese character set.

Get it?

If you don't buy it, then try it on pen and paper. Imagine a hypothetical 10-character alphabet, and try to devise an encoding that will let you fit it into a two-bit word, without going multi-word. Use prefix codes or whatever you want.

It's not going to happen. You also aren't going to get 10k characters into an 8-bit/word single-word character set.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: