Hacker News new | past | comments | ask | show | jobs | submit login

Standard English text is around 1 bit of entropy per character, but it doesn't mean you can only write 2^64 non-gibberish text in 64 characters. For example, there are around 40k eight letter words, way more than 256. The entropy is so low because we keep using the same words, but we don't have to. We can also use abbreviations, invent proper nouns, and we don't even have to limit ourselves to English or even a single language.

"犬 in French is chien, that one is Klign jr" is perfectly valid.

I don't know if all 8s is possible, UTF-8 is quite wasteful for that purpose. Maybe we could take advantage of kanji. I don't know enough about Japanese, and even less Chinese, but it looks like you could make a proper noun by mashing any kanji together, including the obscure ones, and it will be usable and pronounceable. Maybe not pretty, but valid.

Edit: And as a last resort, we could cheat with "password: }8pHgaQ^?7ic'6KIO!uDXQnhL3(6hcfZmRYnGUw1Pz`c?y@D"




> For example, there are around 40k eight letter words, way more than 256. The entropy is so low because we keep using the same words, but we don't have to.

Correct, but that low entropy is what distinguishes

    Monday Tuesday Wednesday Thursday Friday Saturday Sunday
from (sampled from random Wikipedia article titles)

    Alabama Christopher List Park Girlfriend Manor crucifera
or

    I saw your dad outside of Walmart yesterday.
from (5th word of 2nd section of random Wikipedia articles)

    gospel Rich existed school and for Deputy he

> We can also use abbreviations, invent proper nouns,

Sure, make it 1.1 bits per character (the 1 bit is not a precise number anyway). That doesn't change anything about the orders of magnitude involved.

> we don't even have to limit ourselves to English or even a single language.

I tried to look for entropy of Chinese writing, and I've only found an upper bound of 3.8 bits per (UTF8) byte. That makes it still unlikely but at least conceivable that there is an all-8 amulet made of 64 bytes of Chinese text.

> UTF-8 is quite wasteful for that purpose.

Right, but as far as I understand that's the constraint.

> And as a last resort, we could cheat with "password: }8pHgaQ^?7ic'6KIO!uDXQnhL3(6hcfZmRYnGUw1Pz`c?y@D"

Yes, but that's not interesting from either an art nor from a computer science point of view.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: