Hacker News new | past | comments | ask | show | jobs | submit login

Nice tutorial! One small nitpick:

> Calculating a hash value from an input is computationally very easy, but reversing the transformation and recovering the original input from its hash value takes so much time and computing power that it is, practically-speaking, impossible.

The above is true for encryption but not for hash codes. Recovering the original input from a hash code is not just practically impossible; it's provably impossible -- even with an infinite amount of computing power -- because in general a hash code contains less information than the original text.




Actually, it might be occasionally possible. Not perfectly, of course. If the input space is limited and you have other constraints on it, then you might be able to find the original text. For example, let's say your input is a 280 character long tweet, the hash is only 16 bytes long. If you find that "I AM A STABLE GENIUS" is one of the inputs that hashes to the given hash value, then you can be reasonably sure that this was the original text, given that most of the other inputs with the same hash won't be meaningful English sentences.


I'd say the count of believable English tweets would be many orders of magnitude greater the count of 16 byte hashes, so in general we can't just find a believable tweet and declare that we found the plaintext. There would be many possible collision.

The phrase you mentioned is notable (with a slight edit) - so I suppose you mean you could index phrases by notability, then it may be tractable find tweets of notable phrases, since the number of notable phrases is relatively low.


I think the author means recovering the hash by using dictionary attacks


You’d still have hash collisions


Maybe not enough to matter though. If the input space is sufficiently smaller than the hash space (e.g., you aren't hashing arbitrary strings, but instead just arbitrary English words) then the probability of any hash collision that also lies in the realm of interest is vanishingly small.


True, but that'd be very esoteric - a string input field that only accepts values from a dictionary. Although also if the input space was only these values but the input size was infinite you'd have a similar issue :P


It'd be esoteric for a system to enforce only containing values from a dictionary, but every sentence ever spoken by anyone alive fits in a 64-bit space by a wide margin. For real-world input you could still reverse most passwords uniquely given enough time.

It's actually kind of a fun problem -- the fewer bits you have in your hash the easier it is to find _any_ collision that gives access to the current system, but the harder it is to uniquely reverse the hash into a plausible password for stuffing into other systems.


I think the author is just using imprecise language for a preimage attack.[0]

[0] https://en.wikipedia.org/wiki/Preimage_attack


Information is sometimes destroyed, but we can't count on it. An algorithm like HMAC might be used to authenticate a short message. Or even if the message is long, it might also be known, so perhaps the only unknown input (the key) is short. HMAC still provides security in this case, but it's not provable-because-information-is-destroyed in the sense you're describing.


215D70DA68DBDC2217D13916E40F89B87DDC3DAACA99892C943E0DCC76C8C1AF68097C86CC4CB53F15BD6C3870418A1E3577A076611A5104C6D4FC33939A21FB


I said "in general".


> in general a hash code contains less information

That is true, but has nothing to do with your ability to use a rainbow table or dictionary attack on my hashed statement.

The specific reason why you can reverse my hash is because it's unsalted. The time-complexity of my example hash is somewhat low.

'you are wrong' is 13 characters that spans the lowercase alphabet with spaces, in all english words that can be found in a dictionary. The actual key-space is 13^10 because of the limited characters I used, but when creating a rainbow-table it would probably take a key-space of 13^27 because you don't know the specific characters I'm using. (There are 13 characters in my statement, then there are 26 lower-case letters in the alphabet - plus space - which makes 27... 13^27)

Now, if I followed general op-sec suggestions I would have added a salt to my statement, so the process looks like SHA512('you are wrong' + '7aomAxgeVjAvyDXGrdmNJNKuuiumYbkG') which gives you the possible key-space of 45^63. Also, the salt should be something not found in a dictionary.

2812CDA67BF0EA2D5FE125C2637466FA9A81C12AE9D101771C581DB87EBEB05D7724C9DCFBC6F905B99F9737948543EC64CAC5D89C785125DDA2E3297214CC58

On the upper-side of the estimate there are 10^82 atoms in the universe. There are about 14^103 possible solutions to that salted hash above, with many possible collisions. Good luck with creating a database that needs more rows than available atoms, or waiting for a brute-force to complete on that hash, then sorting through the collisions. This is the "impossible" reversible hash you are talking about.

If you are interested in this topic and would like to do this in your code: use the HMAC function which is built for this, or specifically for passwords use bcrypt/blowfish because that algorithm is designed to run slowly (brute force protection).




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: