Hacker News new | past | comments | ask | show | jobs | submit login
Deprecating SHA1 (ietf.org)
146 points by todsacerdoti on Oct 23, 2020 | hide | past | favorite | 109 comments



In case anyone else is wondering, it looks like this will show if any SHA1 signatures are involved (brew or apt install pgpdump).

  gpg --export -a {keyname} | pgpdump
Surprised it isn't in gpg itself - maybe that's why Debian developers hadn't noticed.


I have no idea how to interpret that output. "SHA1" is listed last under "Hashed Sub: preferred hash algorithms(sub 21)(5 bytes)" but I have no clue what that means. I created the key with a recent gpg using default settings this year.


That's your preference for what sort of hash you would like others to use when sending you a signed message. It is in order from most preferred to least preferred so SHA1 is last. Look for the "Hash alg" of the signature packets for your signing key and encryption subkey. Mine looks like this for both:

    Hash alg - SHA256(hash 8)


Strange that people are still implementing technology around SHA1. It is still challenging to find collisions, but none the less SHA1 is provably broken.


Just like security in the physical world, sometimes the weakness is not that great in practice. If you watch enough lockpicking videos on YouTube you may come to the same conclusion.

Would I still trust MD5 to be a strong check against random/non-malicious corruption? Yes. Would I trust it against malicious corruption? No.


In forensics work, MD5 is still used for checksums. It's used for consistency, not uniqueness, so it will likely be usable for years to come.


I think a lot of semi-delusion happens around whether or not cryptographic hash functions are used for integrity and what kind of integrity.

I would argue against this two ways: First, if the only concern was bit-error detection, forensic tools would use a simple checksum. CRC, for example, is far faster to compute than any of these cryptographic functions. Second, one of the functions of evidential custody is to provide some defense against malicious tampering with evidence, even if unlikely. And for this purpose MD5 is clearly an inferior choice to a SHA2 function.

The real reason that forensics tools stick to MD5, I would contend, is a combination of the state of the art in filesystem forensics tools being surprisingly bad and bureaucratic lockin related to documentation and interoperability (the chain of custody form asks for MD5, after all, not SHA512).

This kind of factors into one of my hills to die on, which is that careless or cargo-cult selection of hash functions causes problems. When you select a hash or integrity function for files, you need to make a clear decision about whether your goal is to prevent collision or simply to protect bit errors. You should then choose a function that is state-of-the-art for one of those two categories. The whole line of reasoning behind "MD5 is good enough because we're not really using it for security" is just asking for trouble. When you split the middle you either have a bad integrity check, a bad security control, or more likely both (MD5).


CRC, for example, is far faster to compute than any of these cryptographic functions.

...and offers far less protection even from random corruption; the biggest common CRC is 64 bits, which is really small in comparison to 128-bit MD5 or 160-bit SHA1.


I assume GP just used CRC as an example everyone is familiar with. A more modern non-cryptographic hash function like FNV or MurmurHash can produce more output bits (up to 1024 and 128 respectively), reducing random collision probability to a comparable level.

In addition, CRCs are actually quite good at detecting common errors in data transfer (not just single bit flips but burst errors as well). In fact you can prove that certain kinds of errors will always be detected by an n-bit CRC -- which is something you can't prove with modern cryptographic hash functions. Yeah they're useless against attackers but that's not the threat model if you're "not using MD5 for security".


If you're dealing with a situation in which errors occur at a rate high enough to see random collisions in even CRC16 you need FEC if not spread spectrum techniques. CRCs can provably detect errors in up to n bits of various configurations depending on selection of the polynomial... And CRC dates back many years and is far from the state of the art. MD5 can produce collisions with as little as 6 bits in error, far less than the 128-bit length would suggest.

Cryptographic hash functions are designed with different objectives and methods and are often less well suited, and almost always less well analyzed, for simple error-detection situations.


Since you're not doing any birthday attacks in checking for corruption, 64 bits is perfectly fine. Corruption checking with 64 bits is safer than many uses of md5 would be, even if md5 had no flaws.


If the authors of Blake3 are to be believed, MD5 is slower than Blake3, Blake2, and SHA1. And on par with SHA512.

(On a given architecture)

https://raw.githubusercontent.com/BLAKE3-team/BLAKE3/master/...

I could understand if hashing is a large portion of their CPU time. But if it's not, why not compute MD5 and also something good and store both hashes?


> (On a given architecture)

Yes this is a good caveat. When I've tried the OpenSSL implementations on older ARM chips (a Raspberry Pi Zero), I've seen MD5 come out on top. I'm not sure exactly why it switches.


I think they're just inverted on x86 because it provides hardware acceleration for SHA-1 and SHA-2.

https://en.wikipedia.org/wiki/Intel_SHA_extensions

https://github.com/openssl/openssl/blob/master/crypto/sha/as...


I find SHA-1 to be faster than MD5 on Skylake, which does not have the SHA extensions.


I use SHA1 for duplicate detection on hard drives. It's more robust than MD5 and fast on modern processors. Nothing wrong with that. But I would not use it for crypto.


Time to move on to SHA-512/256 or BLAKE3


Or maybe xxHash would work for duplicate finding? https://github.com/Cyan4973/xxHash


What does consistency mean in this context?


Given an input image of a hard disk or other storage medium, validate that the hash when given to you and after your analysis are the same.


In the presence of someone malicious swapping out drives for a malicious end? Or as a guard against accidentally swapping out the wrong hard drive due to an accident?

For the former, md5 isnt secure (although its not really an attack you can pull off after the fact). For the latter, i guess so, but it'd be hilarious if whatever attacker is causing you to do forensics made all the hard drives have the same hash to make your life more complicated.


Forensics work mostly comprises of people doing criminal or civil things.

Network intrusions, someone sending a fucked up email at work would require you to retrieve data whether it be some logs, or a full hard disk copy.

The idea is that I do analysis and can prove that what I started with (hash1) is the same as when I ended (hash2). If the hash changed, either there was a hardware failure (on write blockers) or a mistake was made when analysing the dataa.


If they are already good at keeping multiple points of view in mind: that MD5 can be broken for one application, but it still works for others; then they should also be able to reason that, MD5 might work passably for this application, but it is time to move on.


Yikes. Surely in forensics you’d want the most collision resistant hashing tool available? Doesn’t the uniqueness guarantee the consistency?


That's kinda the issue, you only want to guarantee that the image didn't change (ie, noone planted anything and the chain of evidence from the crime scene to your nice forensics lab is untampered).

I think once someone makes a tool that allows you to add a file to an image and have it resolve to the same hash (anti-forensics as a field), it will probably change. Most tools do offer dual MD5 and SHA256 support though.


you only want to guarantee that the image didn’t change

But if the algorithm allows you to find hash collisions, you can’t guarantee that the image didn’t change based on the MD5 hash value? e.g. https://natmchugh.blogspot.com/2014/11/three-way-md5-collisi...

Obviously that example is a chosen prefix collision, but this data is coming from an untrusted source after all, so there’s really nothing to stop the attacker choosing the prefix in advance and then publicly shredding trust in the hash at a later date. In practice it sounds like you’d also have a hash of the complete file system, but at this point you’d have to question what advantage there is to using MD5 at all. Attacks never get worse, only ever get better, and the last thing you’d want is for the dam to burst during a lengthy and important investigation.


Yeah, I do wonder if the tooling to cause a full MD5 hash collision across the file system will come in the future.


That's a second preimage attack not a collision attack.


I thought a second preimage attach implied a collision attack?


Yes but the reverse isn't true. Collisions don't imply second-preimage attacks, at least not at any practical difficulty.


Still seems less than ideal to be able to generate a second document that hashes to the same value.


Your phrasing here is unclear.

Being able to construct B such that hash(B) = hash(A) for some document A is Second Pre-image and there is no viable Second Pre-image attack for SHA-1 or even MD5.

What does exist is collision, where you construct documents C and C' such that hash(C) = hash(C').

This means bad guys can persuade humans (and more importantly in most cases, machines) to sign seemingly innocuous document C; and then apparently produce proof document C' was signed even though the human / machine has never seen C' and would not have signed it.


Collision combined with length extension could be a neat way of attacking deduping.

Basically first a prefix, then a collision block, then a suffix that is encrypted with a key derived from one of the collision blocks.

If you can arrange for the deduper to find the document with the wrong key first, then it will skip over the super secret stuff.


For deduplicated storage you really want A=B <=> H(A)=H(B) to hold for all data you are going to see, so accessible constructions for collisions should already bar an algorithm from this application.


I 100% agree, but some people seem to go "eh it's not a problem in practice" for as long as they possibly can.


I would, depending on what I was protecting. I use it every day in Git, for example, on projects where integrity is of moderate importance. Now, against a nation-state? No way.


Against nation state, that ship sailed a long time ago.

We're currently in the realm of 10-20 thousand dollars to make SHA-1 collisions. That's in the realm of an individual, albeit a reasonably (but not extremely) wealthy one. In 5-10 years it may be in the realm of J. random hacker.


Naive question: how often is that collision the same number of bytes as the original?


I've never seen a colliding pair in a "true" hash function (not just CRC) be of differing lengths. They've always been a pair of blocks with very few differing bits, such that the differences "cancel out" in the computation and the internal state is the same regardless of which of the pair is run through it. I suspect trying to create a colliding pair with differing lengths is much more difficult, since SHA1 also includes the length as part of the calculation.


I'm a bit divided, on one hand I agree with you and I even used md5 for things that don't need security, but then I started thinking about it, and came to conclusion that you never how your code might be exploited in the future.


It's not "strange"... earlier hash algorithms tend to enjoy better performance and compatibility, and there are still use cases for which SHA1 (and even MD5) aren't broken.


Sha256 has special instructions in silicon for it. Performance is not a reasonable excuse. If performance mattered and you didnt need sha's security properties, you would be using CRC32 or something.


Excused or not, I'm merely explaining why this isn't surprising.


But there are even better choices for non-cryptographic hash functions (FNV and MurmurHash for instance) because they were designed without the need to have a tradeoff between cryptographic security and efficiency.


You're assuming that all code performing SHA requires a "secure" hash.

It's only "broken" for security applications, and there are tons of others for which security is not even a consideration, or where collision is not possible due to input constraints.


There are hashes which require much less computing power and thus have better throughput if you don't require cryptographic security (examples include FNV-1 and MurmurHash). Using SHA-1 because you "don't need it for security" is still a poor choice.

I'd also argue that the default choice of hash function should always be a secure one until a strong counterargument is given, because it turns out that at some point most developers assume hashes have some minimal security properties (Linus always claimed that the choice of hash function in Git was not about cryptographic security -- and yet the security of the GPG signing of Git objects depends entirely on SHA-1 being cryptographically secure because Git signs the commit hash, and the BitKeeper attack against Linux's source tree only failed because the attack was done so poorly they didn't try to find a collision in the commit ID hash function). Better to be safe than sorry.


Many modern CPUs (most 64-bit ARM, AMD Ryzen, Intel Goldmont to Tremont and Ice Lake/Tiger Lake) have instructions to compute SHA1.

On these computers computing SHA1 may have much better throughput than any suitable non-cryptographic hash.


I agree SHA-1 is a poor choice for signing git objects, for reasons that should have been obvious when it was implemented.

Unfortunately I don't think the same developers who are assuming all hashes have some kind of magic security properties, are the ones who would follow your best practice for default choice of hash functions.

Also I wish some of those higher throughput hashes were available in e.g. python's hashlib by default. For applications that can include minimal or no external packages, hashlib.sha1 still turns out to be the best choice in certain cases.


That would have to be some pretty constrained input if you are rulling out collisions in principle


Someone in my company is using MD5 for passwords. This is recent software, like built in the last decade. Passwords were not a mystery to anyone when it was written, at least not anyone who cared about security.

I don't know what to say about it. It's not the first time something dumb has happened, and I'm tired of screaming at the brick wall.


Depends on how they use it. HMAC-MD5 is not broken, for example. MD5 could potentially be a building block for password hashing still (but it has to be a _building block_)


Well the reason that md5 is a bad choice for that has nothing to do with collisions or this article. Sha-256 would also be a bad choice for that.


Well they are not in this particular case. These are old keys.


The article said they were new keys.


This is one of the many ways that shows how broken (in the "defective software" sense, not necessarily directly the cryptographic sense, yet) the entire PGP/GPG ecosystem is.


Does modern gpg still use SHA1 for something? Or is it just kept there for backwards compatibility (e.g. checking old signatures) ?


Some may argue that having an undefined timeframe of backwards compatibility spanning more than a decade is a sign of a broken ecosystem.


others would argue that it's a sign of stability and maturity.


This kind of "stability and maturity" is exactly what you don't want in a cryptosystem.


Did my research here. I was expecting GnuPG to never emit a message including SHA1, just use it to verify signatures from the existing ones. A pretty sensible thing, AFAIK.

To my surprise, gpg2 added SHA1 as a valid MD algo in a new key because the specification says it MUST be supported and implementations should use it if no other is found in key preferences. However it is the one with the lowest priority and will not be used with any that was generated in the last decade or had its preferences chenged to use another MD.

A key created with the default options:

    gaius@baseship:/tmp$ mkdir newhome
    gaius@baseship:/tmp$ gpg --homedir newhome --gen-key   
    gpg: WARNING: unsafe permissions on homedir '/tmp/newhome'
    gpg (GnuPG) 2.2.4; Copyright (C) 2017 Free Software Foundation, Inc.
    This is free software: you are free to change and redistribute it.
    There is NO WARRANTY, to the extent permitted by law.    

    gpg: keybox '/tmp/newhome/pubring.kbx' created
    Note: Use "gpg --full-generate-key" for a full featured key generation dialog.
                                 
    GnuPG needs to construct a user ID to identify your key.
                 
    Real name: Gaius Baltar                                  
    Email address: baltar@defense.caprica.kob
    You selected this USER-ID:                       
        "Gaius Baltar <baltar@defense.caprica.kob>" 
                                           
    Change (N)ame, (E)mail, or (O)kay/(Q)uit? o
    We need to generate a lot of random bytes. It is a good idea to perform
    some other action (type on the keyboard, move the mouse, utilize the
    disks) during the prime generation; this gives the random number
    generator a better chance to gain enough entropy.
    We need to generate a lot of random bytes. It is a good idea to perform
    some other action (type on the keyboard, move the mouse, utilize the
    disks) during the prime generation; this gives the random number
    generator a better chance to gain enough entropy.
    gpg: /tmp/newhome/trustdb.gpg: trustdb created
    gpg: key 0832341A4A0CE664 marked as ultimately trusted
    gpg: directory '/tmp/newhome/openpgp-revocs.d' created
    gpg: revocation certificate stored as '/tmp/newhome/openpgp-revocs.d/C64E6636890D8BAACA843AFC0832341A4A0CE664.rev'
    public and secret key created and signed.

    pub   rsa3072 2020-10-24 [SC] [expires: 2022-10-24]
          C64E6636890D8BAACA843AFC0832341A4A0CE664
    uid                      Gaius Baltar <baltar@defense.caprica.kob>
    sub   rsa3072 2020-10-24 [E] [expires: 2022-10-24]

    gaius@baseship:/tmp$ gpg --homedir newhome --edit-key baltar
    gpg: WARNING: unsafe permissions on homedir '/tmp/newhome'
    gpg (GnuPG) 2.2.4; Copyright (C) 2017 Free Software Foundation, Inc.
    This is free software: you are free to change and redistribute it.
    There is NO WARRANTY, to the extent permitted by law.

    Secret key is available.

    gpg: checking the trustdb
    gpg: marginals needed: 3  completes needed: 1  trust model: pgp
    gpg: depth: 0  valid:   1  signed:   0  trust: 0-, 0q, 0n, 0m, 0f, 1u
    gpg: next trustdb check due at 2022-10-24
    sec  rsa3072/0832341A4A0CE664
         created: 2020-10-24  expires: 2022-10-24  usage: SC  
         trust: ultimate      validity: ultimate
    ssb  rsa3072/89519D45B7AECE4D
         created: 2020-10-24  expires: 2022-10-24  usage: E   
    [ultimate] (1). Gaius Baltar <baltar@defense.caprica.kob>

    gpg> showpref
    [ultimate] (1). Gaius Baltar <baltar@defense.caprica.kob>
         Cipher: AES256, AES192, AES, 3DES
         Digest: SHA512, SHA384, SHA256, SHA224, SHA1
         Compression: ZLIB, BZIP2, ZIP, Uncompressed
         Features: MDC, Keyserver no-modify

    gpg>


I don't know that the hash preferences in the public key are relevant to the issue talked about in the linked email thread. Currently GPG (and Sequoia) will reject the signing of keys using SHA1 if they claim to be recent.

The preferences would allow you to send me two messages that used the same SHA1 hash for the signature. I would never know that it had happened as I would never see the hash, only the valid signature. Both signatures would be logically valid as they came from your assertive action.


Au contraire, this is a good example of how well the PGP/GPG ecosystem deals with the realities of a long term cryptography standard. People with SHA1 hashes in their signatures have a reasonable way to transition away from those signatures while maintaining the reputation of those identities. There really isn't anything out there as good.

I should also point out that that the attack this is addressing (Shambles) has close to no practical implications. So this is a good example of how the PGP/GPG ecosystem takes security breakage very seriously.


Just like git, which has much wider impact, considering how many more people use software from git than use pgp or gpg.


I don't think git isn't using SHA1 for security though (outside of potentially when signing commits, but that's generally done with gpg, which you drew a distinction with). Assuming you're talking about commit hashes, I'm not particularly worried about the threat of a someone crafting a commit with a malicious hash to my repo because I have much bigger issues than commit hash duplicates if someone malicious is able to commit to my repo. Given that, SHA1 is just as safe to use for my commit hashes as it's always been; the fact that crafting collisions purposely has gotten easier doesn't make accidental collisions any more likely to happen.


> I have much bigger issues than commit hash duplicates if someone malicious is able to commit to my repo

On github, all forks of a repo shared the commit objects.


Our mistake is using cryptosystems that only last 20 years.

We should be using 100 year crypto. That's crypto that a crypto expert is willing to bet her entire net worth will be unbroken in 100 years.

Today, no such schemes exist.

But as far as I know, any two different hashing schemes applied sequentially has never been broken.

Therefore, why not get the Americans to develop their best hashing algorithm, the Chinese to develop their best algorithm, and then use A(C(data)).

Alternatively, bite the performance bullet, and use a provably secure hash algorithm. Build specialized hardware for the problem to speed it up.


How would hashing twice help with a collision? Surely if your hash function is A(C()) then if you find a collision in C() then that is also a collision in A(C())?

Also, just saying “we should use 100 year crypto” is about as useful as saying “we should hurry up and cure cancer”.

Nobody has a probably secure hashing function. It’s actually a huge problem in cryptography that a lot of the time hashing is modelled as a perfect random oracle and there are schemes which are secure only if modelled with a random oracle, but insecure when instantiated with any concrete hashing function.


Probably means (A(text) XOR C(text)) or something. Then you would need to find a collision in both algorithms for the same input and that will be arguably much harder.

I.e. just because someone demonstrates a lack of knowledge in his/her post, you can still show some flexibility for positive interpretation ;).


This is only as strong as the stronger of A and C. If we had picked MD2 and MD4 to xor in the 90's we'd be no better off today.

If there was a trivial way to get much stronger hashing with double the compute time we'd do it.


If you xor the hashes together, it may be as weak as the weaker of the two functions, though it probably depends on their structure.

Getting much stronger hashing with double compute time is easy. Double the number of round - most alorithms have about 30-60% security margins. It would probably avoid any breaks of the hash (so 80 bits for SHA1), though it can't be enough if the hash is simply too small (64 bit security for MD5/4/2).


Couldn't you concatenate the two hashes? Twice the size but you'd have to find a collision for both algorithms.


Finding the two collisions are not independent of each other so its more like finding the collision in whichever function is stronger. (I.e. finding a collision in md5 + sha1 is roughly as hard as finding a collision in just sha1)


Is that true?

In the past (long enough ago that using md5 almost made sense), I used a combination of md5 and sha1 to verify file integrity. Both checksums had to match or the file was rejected.

It seems obvious that that's at least as strong as using sha1 by itself. I've always assumed it was significantly stronger than sha1 by itself, but you're saying it isn't.

Creating an md5 collision isn't terribly difficult these days, but I've never heard of an md5 collision that's also a sha1 collision.

Can you cite a source with more information on this?

(I'm not doubting what you say, I'd just like more information.)


https://www.iacr.org/archive/crypto2004/31520306/multicollis...

Although looking over this again, there is some caveats in that if both hash functions suffer from shortcut attacks, you might not be able to combine both shortcut attacks depending on the details of the shortcut attack.

Regardless, in the sha1+md5 case, even if you cant combine both shortcut attacks (big if. Not sure if that's true or not), the complexity of the sha1 collision is roughly the same as a birthday attack on md5, so i think you still end up with close to same big-oh as if you could combine both shortcut attacks.

Disclaimer: IANAC.


then we just need to use ALL of them


Yeah, this is similar to how people ended up inventing HMAC I think.


Provably secure hashing functions exist. For example Very Smooth Hash.

They just perform poorly.


You are right


> Our mistake is using cryptosystems that only last 20 years.

> We should be using 100 year crypto. That's crypto that a crypto expert is willing to bet her entire net worth will be unbroken in 100 years.

This sounds nice but let's wait until modern electronic computers have even been around 100 years so we can have an idea of what that means.


We're remarkably close--ENIAC was first powered up in 1946. Kinda cool when you think about it--electronic computers are WWII-era technology and a lot of us reading this now will see their 100-year anniversary.


In other words we're still over a quarter of a century away.


Mhmm, and what will you pay for that kind of guarantee, hmmm?


I think you responded to the wrong comment.


> But as far as I know, any two different hashing schemes applied sequentially has never been broken.

Suppose you used SHA256(MD5(x)) as your sequential hashing function.

If I can find two messages m1, and m2 that have the same MD5 digest, the outer SHA256 will also collide. MD5 is sufficiently broken that finding such messages is easy: https://github.com/thereal1024/python-md5-collision

Two different hashing schemes applied in parallel is better. However even there many uses of hashing functions require all the bits in the output digest to be equally unpredictable.


Two hashing schemes in parallel has problems too. Suppose your parallel hash is SHA3+Ident, where Ident is either the first 100 bytes of the message or the message padded with NUL up to 100 bytes. Because Ident has no preimage resistance, the combo SHA3+Ident has very poor preimage resistance too.

This thread has relevant info: https://crypto.stackexchange.com/questions/270/guarding-agai...


By "in parallel" I don't mean XOR. I mean concatenation. So in my example sha256+md5 would concatenate sha256's 256 bit digest, and md5's 128 bit digest, to form a 384 bit digest.

The intuitive explanation for why XOR instead of concatenation doesn't work, is it lets the output of the broken hash function influence the output of the remaining good hash function. That is obviously cause for concern!


> The intuitive explanation for why XOR instead of concatenation doesn't work, is it lets the output of the broken hash function influence the output of the remaining good hash function. That is obviously cause for concern!

The bits of a good hash function should be random. XORing with some unrelated-ish thing should be no problem. So that's not enough of an explanation.


I still don't understand what you're claiming is wrong with XOR. As I understand it, the XOR of two secure hashes should be at least as strong as each one. Of course you can find ways to deliberately come up with a broken scheme if you really want to do something spectacularly dumb (like XORing a variation of the same algorithm with itself...), at which point the problem isn't really XOR, but assuming you're being reasonable and showing some sense in the process, the likelihood of someone somehow finding an exploitable correlation between them without breaking either one should be astronomically small to not even warrant a second thought.


Just as a mental exercise (im no expert) how about. GoodHash=HF(input) + HF(reversed-input) Your final hash value will still be twice as bjg yes, but reversing the input should yield a different result ? This is of course for non-crypto usage.


If one of your hashes is iterated, you also have the multicollision problem: once you find a collision in your weak, iterated hash, you can quickly produce more collisions, so that the strength of your construction just devolves to that of the stronger hash.


For clarity, which part of petertodd's comment are you replying to?

(I'm learning a little cryptography from this thread, so the above is a straight question.)


I believe he is responding to "Two different hashing schemes applied in parallel is better."

My understanding is if you have a collision in the stronger function, you have an infinite family of collisions in the stronger function due to the length extension attack, from there you should be able to base the collision for the weaker function on one of the stronger function's (length extended) collisions.


Thank you for that reply - particularly the informative explanation.

I was sort of assuming it must be a reference to the parallel schemes because with SHA256(MD5(x)), once you had an MD5 collision you would be done.


> Suppose you used SHA256(MD5(x)) as your sequential hashing function.

SHA256(concat(MD5(x), x)) would be significantly more effective at the possible expense of performance.

Of course I’m sure that if you’re willing to make a performance trade-off there’s going to be a better single hashing algorithm out there.


Everything in cryptography is more complicated than you'd hope it would be. On this one, see for instance:

https://cryptopals.com/sets/7/challenges/52


> A(C(data))

This is just as weak as C. Any collision in C is by definition a collision in A(C(data).

You probably meant A(data) ^ C(data) [XOR] which is mostly as weak as the stronger of A and C. Google multicollisions in iterated hash functions for more details.


Hashing twice doesn't actually help in many cases that hashes are used. And I don't think there is a such thing as a "provably secure hashing algorithm", although I would very much be interested in learning about the topic if you have a reference :)


There are some here: https://en.m.wikipedia.org/wiki/Security_of_cryptographic_ha...

But, it does say practically provable


>Therefore, why not get the Americans to develop their best hashing algorithm, the Chinese to develop their best algorithm, and then use A(C(data)).

If you can find a collision in C(x), then A(C(x)) will have a collision as well.


Well, then make your hash:

    A(C(x))+C(A(x))
Just concatenate all the permutations of hashing functions.

Although, maybe just concatenating the single-hash values is just as secure.

    A(x)+C(x)


At four times the computation, which is a pretty big problem when you’re trying to authenticate every packet on a >= 10 Gb/s network

Using a better hash like Blake2 makes a lot more sense


If the point of using two hashes is resistance against people breaking one of the hashing algorithms used, you’re always going to have to pick more than one algorithm to get that benefit. However, I don’t think nesting the calculations is actually necessary: Blake2(msg)+sha512(msg) would give you the same protection.


> Therefore, why not get the Americans to develop their best hashing algorithm, the Chinese to develop their best algorithm, and then use A(C(data)).

Would it not be better to report both A(data) and C(data) (or maybe all three), so you have to simultaneously find a collision in both? Using A(C(data)), finding a collision in C(data) gives a collision in A(C(data)) as well.

Further, there are other considerations - it doesn't matter how secure the hash is if it's unusably slow.


> bite the performance bullet, and use a provably secure hash algorithm

Do these exist? (Genuine question; I’m not familiar with the state of the art.)


Depends what you mean by "provably secure." Hashing algorithms are usually only practical because of what compromises they make to achieve "good enough."

- At the most basic level, (practical) hashing cannot map all inputs to the same number of outputs, since consistent output length is usually highly desirable. This means that for sufficiently long inputs you will get collisions. What determines how good a hash is is often how hard that is to do.

- If your inputs map 1:1 to outputs, rainbow tables become harder but you open yourself up to a number of other possible attacks starting with the obvious length analysis.

- Hashing also often has to be very performant, which leads to its own compromises, but in many applications (like message digests) throughput performance is given a lot of weight.

In the most formal way, I guess it's possible to have a provably secure hash for all possible attacks, but this would most likely require a 1:1 mapping between inputs and outputs. This would be impractical for a few reasons.


The output has to be indistinguishable from random, and colliding n bits needs to take an exponential amount of time.

Output length can be anything. As long as it's indistinguishable from random, the chance of collision will be exponentially rare, and you can truncate or concatenate to get the length you want.


You're correct on all points, though this kind of implementation (with variable-length output) is fairly rare AFAIK. Having a fixed-length output lets you reliably define for example a database field length.

I'd love to see a practical implementation of that though, if you know where to find that kind of project off-hand!


I didn't mean "variable" when I said "anything". I was saying it didn't matter. Variable length would be valid, but so is a fixed 200 bit output.

The main thing I wanted to express is that "provably secure" doesn't mean no collisions.


Ya, there are also speed considerations when it comes to hashing that are more important than total security. If your mobile phone catches on fire, or your bandwidth is cut by half the extra security isn't worth the trade.

Also go back to 1920 and design a public crypto system that can't be broken today.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: