Nice tutorial! One small nitpick: > Calculating a hash value from an input is co...

sz4kerto · on July 21, 2020

Actually, it might be occasionally possible. Not perfectly, of course. If the input space is limited and you have other constraints on it, then you might be able to find the original text. For example, let's say your input is a 280 character long tweet, the hash is only 16 bytes long. If you find that "I AM A STABLE GENIUS" is one of the inputs that hashes to the given hash value, then you can be reasonably sure that this was the original text, given that most of the other inputs with the same hash won't be meaningful English sentences.

talaketu · on July 21, 2020

I'd say the count of believable English tweets would be many orders of magnitude greater the count of 16 byte hashes, so in general we can't just find a believable tweet and declare that we found the plaintext. There would be many possible collision.

The phrase you mentioned is notable (with a slight edit) - so I suppose you mean you could index phrases by notability, then it may be tractable find tweets of notable phrases, since the number of notable phrases is relatively low.

lamida · on July 21, 2020

I think the author means recovering the hash by using dictionary attacks

exdsq · on July 21, 2020

You’d still have hash collisions

hansvm · on July 21, 2020

Maybe not enough to matter though. If the input space is sufficiently smaller than the hash space (e.g., you aren't hashing arbitrary strings, but instead just arbitrary English words) then the probability of any hash collision that also lies in the realm of interest is vanishingly small.

exdsq · on July 21, 2020

True, but that'd be very esoteric - a string input field that only accepts values from a dictionary. Although also if the input space was only these values but the input size was infinite you'd have a similar issue :P

hansvm · on July 21, 2020

It'd be esoteric for a system to enforce only containing values from a dictionary, but every sentence ever spoken by anyone alive fits in a 64-bit space by a wide margin. For real-world input you could still reverse most passwords uniquely given enough time.

It's actually kind of a fun problem -- the fewer bits you have in your hash the easier it is to find _any_ collision that gives access to the current system, but the harder it is to uniquely reverse the hash into a plausible password for stuffing into other systems.

KMag · on July 21, 2020

I think the author is just using imprecise language for a preimage attack.[0]

[0] https://en.wikipedia.org/wiki/Preimage_attack

oconnor663 · on July 21, 2020

Information is sometimes destroyed, but we can't count on it. An algorithm like HMAC might be used to authenticate a short message. Or even if the message is long, it might also be known, so perhaps the only unknown input (the key) is short. HMAC still provides security in this case, but it's not provable-because-information-is-destroyed in the sense you're describing.

netsectoday · on July 22, 2020

215D70DA68DBDC2217D13916E40F89B87DDC3DAACA99892C943E0DCC76C8C1AF68097C86CC4CB53F15BD6C3870418A1E3577A076611A5104C6D4FC33939A21FB

dreamcompiler · on July 22, 2020

I said "in general".

netsectoday · on July 22, 2020

> in general a hash code contains less information

That is true, but has nothing to do with your ability to use a rainbow table or dictionary attack on my hashed statement.

The specific reason why you can reverse my hash is because it's unsalted. The time-complexity of my example hash is somewhat low.

'you are wrong' is 13 characters that spans the lowercase alphabet with spaces, in all english words that can be found in a dictionary. The actual key-space is 13^10 because of the limited characters I used, but when creating a rainbow-table it would probably take a key-space of 13^27 because you don't know the specific characters I'm using. (There are 13 characters in my statement, then there are 26 lower-case letters in the alphabet - plus space - which makes 27... 13^27)

Now, if I followed general op-sec suggestions I would have added a salt to my statement, so the process looks like SHA512('you are wrong' + '7aomAxgeVjAvyDXGrdmNJNKuuiumYbkG') which gives you the possible key-space of 45^63. Also, the salt should be something not found in a dictionary.

2812CDA67BF0EA2D5FE125C2637466FA9A81C12AE9D101771C581DB87EBEB05D7724C9DCFBC6F905B99F9737948543EC64CAC5D89C785125DDA2E3297214CC58

On the upper-side of the estimate there are 10^82 atoms in the universe. There are about 14^103 possible solutions to that salted hash above, with many possible collisions. Good luck with creating a database that needs more rows than available atoms, or waiting for a brute-force to complete on that hash, then sorting through the collisions. This is the "impossible" reversible hash you are talking about.

If you are interested in this topic and would like to do this in your code: use the HMAC function which is built for this, or specifically for passwords use bcrypt/blowfish because that algorithm is designed to run slowly (brute force protection).