"Once we have the SHA1 name of a file we can safely request this file from any s...

TeMPOraL · on March 15, 2015

When you have a hash you trust locally, you can fetch a file from anywhere without caring about the source - the only thing that matters is if the file matches your hash. So the only thing you need to care about is ensuring that the hash you have is the one you really want.

afandian · on March 15, 2015

Only if you assume that your hashcode can't be collided (it probably can).

yeukhon · on March 16, 2015

Their point is that you can only assume so much about the security of the tools you are using. If your hash algorithm of choice is insecure, then you have to switch. Take apt-get install package, if the SHA hash of the package matches what the server claims, then you ought to trust the package even if the package is from a random website. But if you are paranoid you shouldn't be getting from a random website. If you want extra confidence, then only get from servers you trust, and only get from sites protected over HTTPS.

Also consider what benefit do you get by verifying the entire file? Some applications may wish to read the first few bytes to ensure such file is openable by the application, but in the end the first N bytes can fool you if the last M bytes are malicious. So you would open the file in a sandbox to minimize impact.

gojomo · on March 15, 2015

If you're using a proper cryptographically-secure hash, it almost certainly can't be collided.

afandian · on March 15, 2015

So I happily get file md5:d41d8cd98f00b204e9800998ecf8427e

Tomorrow everyone discovers that MD5 has been compromised by some organisation with a lot of money (obviously this happened long ago).

So the author needs to re-publish it as sha:adc83b19e793491b1c6ea0fd8b46cd9f32e592fc

And all my links are suddenly broken and I can't provide a mapping from old to new.

And then someone breaks SHA...

All I'm saying is that content-addressed hashing doesn't obviate the need for secure transport and trust.

gojomo · on March 15, 2015

MD5 has been known to be too weak for this purpose since 1995. So, no competent protocol-designer or publisher will have used it for this purpose for decades. SHA1 is now under enough suspicion to avoid for this purpose, but not yet proven to be compromisable.

But SHA2-256 and up, and many other hashes, are still safe for this purpose and likely to remain so for decades – and perhaps indefinitely.

So within the lifetime of an application or even a person, secure-hash-naming does obviate the need for secure transport and trust. Also note that 'secure' transport and trust, if dependent on things like SSL/TLS/PKI, also relies on the collision-resistance of secure hash functions – in some cases even weaker hash functions than anyone would consider for content-naming.

(For the extremely paranoid, using pairs of hash functions that won't be broken simultaneously, and assuming some sort of reliable historical-record/secure-timestamping is possible, mappings can be robust against individual hash breaks and refreshed, relay-race-baton-style, indefinitely.)

smadge · on March 16, 2015

Gojomo, that was the point of using MD5 in the example. If this system had been deployed in 1995, it would have been used MD5, and thus the problem of broken/obsolete links that the comment outlined would have applied to it after a few years. Who's to say the same thing wont happen to this system in 5 years?

gojomo · on March 16, 2015

There was no surprise break of MD5 – it came after plenty of warning, so even a hypothetical 1995 deployment would've had years for a gradual transition, and continuity-of-reference via correlation-mapping to a new hash.

So even that hypothetical example – with an early, old, and ultimately flawed secure hash – reveals hash-based as more robust than the alternatives.

And in practice, hash-names are as strong or stronger than the implied alternative of "trust by source" – because identification of the source is, under the covers, also reliant on secure hashes… plus other systems that can independently fail.

We have experience now with how secure hash functions weaken and fail. It's happened for a few once-trusted hashes, with warning, slowly over decades. And as a result, the current recommended secure hashes are much improved – their collision-resistance could outlive everyone here.

Compare that to the rate of surprise compromises in SSL libraries or the PKI/CA infrastructure – several a year. Or the fact that SSL websites were still offering sessions bootstrapped from MD5-based PKI certificates after MD5 collisions were demonstrated.

wtbob · on March 16, 2015

Well, we understand hash functions a lot better now than we did back then. It would be foolish to confidently state that SHA2 or SHA3 will _never_ be broken, but it's not foolish to state that, given what we know, they are unlikely to be broken.

gojomo · on March 15, 2015

In context, Armstrong is only concerned about 'security' from tampering/forgery in that statement. Confidentiality is not specifically ensured... but being indifferent as to the path/server which delivers your content may help the effectiveness of other strategies for obscuring your interest, such as routing your requests through mixes of trusted and untrusted relays.

bagels · on March 15, 2015

Don't you also need to at least verify the file matches the hash requested?

gojomo · on March 15, 2015

Yes, of course.

If you use a tree hash, the side sending you content can even include compact proofs that what they're sending you is a legitimate part of a full-file with the desired final hash.

So for example, if receiving a 10GB file, you don't have to get all 10GB before learning any particular relayer is a dishonest node.