I have no idea how to interpret that output. "SHA1" is listed last under "Hashed Sub: preferred hash algorithms(sub 21)(5 bytes)" but I have no clue what that means. I created the key with a recent gpg using default settings this year.
That's your preference for what sort of hash you would like others to use when sending you a signed message. It is in order from most preferred to least preferred so SHA1 is last. Look for the "Hash alg" of the signature packets for your signing key and encryption subkey. Mine looks like this for both:
Strange that people are still implementing technology around SHA1. It is still challenging to find collisions, but none the less SHA1 is provably broken.
Just like security in the physical world, sometimes the weakness is not that great in practice. If you watch enough lockpicking videos on YouTube you may come to the same conclusion.
Would I still trust MD5 to be a strong check against random/non-malicious corruption? Yes. Would I trust it against malicious corruption? No.
I think a lot of semi-delusion happens around whether or not cryptographic hash functions are used for integrity and what kind of integrity.
I would argue against this two ways: First, if the only concern was bit-error detection, forensic tools would use a simple checksum. CRC, for example, is far faster to compute than any of these cryptographic functions. Second, one of the functions of evidential custody is to provide some defense against malicious tampering with evidence, even if unlikely. And for this purpose MD5 is clearly an inferior choice to a SHA2 function.
The real reason that forensics tools stick to MD5, I would contend, is a combination of the state of the art in filesystem forensics tools being surprisingly bad and bureaucratic lockin related to documentation and interoperability (the chain of custody form asks for MD5, after all, not SHA512).
This kind of factors into one of my hills to die on, which is that careless or cargo-cult selection of hash functions causes problems. When you select a hash or integrity function for files, you need to make a clear decision about whether your goal is to prevent collision or simply to protect bit errors. You should then choose a function that is state-of-the-art for one of those two categories. The whole line of reasoning behind "MD5 is good enough because we're not really using it for security" is just asking for trouble. When you split the middle you either have a bad integrity check, a bad security control, or more likely both (MD5).
CRC, for example, is far faster to compute than any of these cryptographic functions.
...and offers far less protection even from random corruption; the biggest common CRC is 64 bits, which is really small in comparison to 128-bit MD5 or 160-bit SHA1.
I assume GP just used CRC as an example everyone is familiar with. A more modern non-cryptographic hash function like FNV or MurmurHash can produce more output bits (up to 1024 and 128 respectively), reducing random collision probability to a comparable level.
In addition, CRCs are actually quite good at detecting common errors in data transfer (not just single bit flips but burst errors as well). In fact you can prove that certain kinds of errors will always be detected by an n-bit CRC -- which is something you can't prove with modern cryptographic hash functions. Yeah they're useless against attackers but that's not the threat model if you're "not using MD5 for security".
If you're dealing with a situation in which errors occur at a rate high enough to see random collisions in even CRC16 you need FEC if not spread spectrum techniques. CRCs can provably detect errors in up to n bits of various configurations depending on selection of the polynomial... And CRC dates back many years and is far from the state of the art. MD5 can produce collisions with as little as 6 bits in error, far less than the 128-bit length would suggest.
Cryptographic hash functions are designed with different objectives and methods and are often less well suited, and almost always less well analyzed, for simple error-detection situations.
Since you're not doing any birthday attacks in checking for corruption, 64 bits is perfectly fine. Corruption checking with 64 bits is safer than many uses of md5 would be, even if md5 had no flaws.
I could understand if hashing is a large portion of their CPU time. But if it's not, why not compute MD5 and also something good and store both hashes?
Yes this is a good caveat. When I've tried the OpenSSL implementations on older ARM chips (a Raspberry Pi Zero), I've seen MD5 come out on top. I'm not sure exactly why it switches.
I use SHA1 for duplicate detection on hard drives. It's more robust than MD5 and fast on modern processors. Nothing wrong with that. But I would not use it for crypto.
In the presence of someone malicious swapping out drives for a malicious end? Or as a guard against accidentally swapping out the wrong hard drive due to an accident?
For the former, md5 isnt secure (although its not really an attack you can pull off after the fact). For the latter, i guess so, but it'd be hilarious if whatever attacker is causing you to do forensics made all the hard drives have the same hash to make your life more complicated.
Forensics work mostly comprises of people doing criminal or civil things.
Network intrusions, someone sending a fucked up email at work would require you to retrieve data whether it be some logs, or a full hard disk copy.
The idea is that I do analysis and can prove that what I started with (hash1) is the same as when I ended (hash2). If the hash changed, either there was a hardware failure (on write blockers) or a mistake was made when analysing the dataa.
If they are already good at keeping multiple points of view in mind: that MD5 can be broken for one application, but it still works for others; then they should also be able to reason that, MD5 might work passably for this application, but it is time to move on.
That's kinda the issue, you only want to guarantee that the image didn't change (ie, noone planted anything and the chain of evidence from the crime scene to your nice forensics lab is untampered).
I think once someone makes a tool that allows you to add a file to an image and have it resolve to the same hash (anti-forensics as a field), it will probably change. Most tools do offer dual MD5 and SHA256 support though.
Obviously that example is a chosen prefix collision, but this data is coming from an untrusted source after all, so there’s really nothing to stop the attacker choosing the prefix in advance and then publicly shredding trust in the hash at a later date. In practice it sounds like you’d also have a hash of the complete file system, but at this point you’d have to question what advantage there is to using MD5 at all. Attacks never get worse, only ever get better, and the last thing you’d want is for the dam to burst during a lengthy and important investigation.
Being able to construct B such that hash(B) = hash(A) for some document A is Second Pre-image and there is no viable Second Pre-image attack for SHA-1 or even MD5.
What does exist is collision, where you construct documents C and C' such that hash(C) = hash(C').
This means bad guys can persuade humans (and more importantly in most cases, machines) to sign seemingly innocuous document C; and then apparently produce proof document C' was signed even though the human / machine has never seen C' and would not have signed it.
For deduplicated storage you really want A=B <=> H(A)=H(B) to hold for all data you are going to see, so accessible constructions for collisions should already bar an algorithm from this application.
I would, depending on what I was protecting. I use it every day in Git, for example, on projects where integrity is of moderate importance. Now, against a nation-state? No way.
Against nation state, that ship sailed a long time ago.
We're currently in the realm of 10-20 thousand dollars to make SHA-1 collisions. That's in the realm of an individual, albeit a reasonably (but not extremely) wealthy one. In 5-10 years it may be in the realm of J. random hacker.
I've never seen a colliding pair in a "true" hash function (not just CRC) be of differing lengths. They've always been a pair of blocks with very few differing bits, such that the differences "cancel out" in the computation and the internal state is the same regardless of which of the pair is run through it. I suspect trying to create a colliding pair with differing lengths is much more difficult, since SHA1 also includes the length as part of the calculation.
I'm a bit divided, on one hand I agree with you and I even used md5 for things that don't need security, but then I started thinking about it, and came to conclusion that you never how your code might be exploited in the future.
It's not "strange"... earlier hash algorithms tend to enjoy better performance and compatibility, and there are still use cases for which SHA1 (and even MD5) aren't broken.
Sha256 has special instructions in silicon for it. Performance is not a reasonable excuse. If performance mattered and you didnt need sha's security properties, you would be using CRC32 or something.
But there are even better choices for non-cryptographic hash functions (FNV and MurmurHash for instance) because they were designed without the need to have a tradeoff between cryptographic security and efficiency.
You're assuming that all code performing SHA requires a "secure" hash.
It's only "broken" for security applications, and there are tons of others for which security is not even a consideration, or where collision is not possible due to input constraints.
There are hashes which require much less computing power and thus have better throughput if you don't require cryptographic security (examples include FNV-1 and MurmurHash). Using SHA-1 because you "don't need it for security" is still a poor choice.
I'd also argue that the default choice of hash function should always be a secure one until a strong counterargument is given, because it turns out that at some point most developers assume hashes have some minimal security properties (Linus always claimed that the choice of hash function in Git was not about cryptographic security -- and yet the security of the GPG signing of Git objects depends entirely on SHA-1 being cryptographically secure because Git signs the commit hash, and the BitKeeper attack against Linux's source tree only failed because the attack was done so poorly they didn't try to find a collision in the commit ID hash function). Better to be safe than sorry.
I agree SHA-1 is a poor choice for signing git objects, for reasons that should have been obvious when it was implemented.
Unfortunately I don't think the same developers who are assuming all hashes have some kind of magic security properties, are the ones who would follow your best practice for default choice of hash functions.
Also I wish some of those higher throughput hashes were available in e.g. python's hashlib by default. For applications that can include minimal or no external packages, hashlib.sha1 still turns out to be the best choice in certain cases.
Someone in my company is using MD5 for passwords. This is recent software, like built in the last decade. Passwords were not a mystery to anyone when it was written, at least not anyone who cared about security.
I don't know what to say about it. It's not the first time something dumb has happened, and I'm tired of screaming at the brick wall.
Depends on how they use it. HMAC-MD5 is not broken, for example. MD5 could potentially be a building block for password hashing still (but it has to be a _building block_)
This is one of the many ways that shows how broken (in the "defective software" sense, not necessarily directly the cryptographic sense, yet) the entire PGP/GPG ecosystem is.
Did my research here. I was expecting GnuPG to never emit a message including SHA1, just use it to verify signatures from the existing ones. A pretty sensible thing, AFAIK.
To my surprise, gpg2 added SHA1 as a valid MD algo in a new key because the specification says it MUST be supported and implementations should use it if no other is found in key preferences. However it is the one with the lowest priority and will not be used with any that was generated in the last decade or had its preferences chenged to use another MD.
A key created with the default options:
gaius@baseship:/tmp$ mkdir newhome
gaius@baseship:/tmp$ gpg --homedir newhome --gen-key
gpg: WARNING: unsafe permissions on homedir '/tmp/newhome'
gpg (GnuPG) 2.2.4; Copyright (C) 2017 Free Software Foundation, Inc.
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
gpg: keybox '/tmp/newhome/pubring.kbx' created
Note: Use "gpg --full-generate-key" for a full featured key generation dialog.
GnuPG needs to construct a user ID to identify your key.
Real name: Gaius Baltar
Email address: baltar@defense.caprica.kob
You selected this USER-ID:
"Gaius Baltar <baltar@defense.caprica.kob>"
Change (N)ame, (E)mail, or (O)kay/(Q)uit? o
We need to generate a lot of random bytes. It is a good idea to perform
some other action (type on the keyboard, move the mouse, utilize the
disks) during the prime generation; this gives the random number
generator a better chance to gain enough entropy.
We need to generate a lot of random bytes. It is a good idea to perform
some other action (type on the keyboard, move the mouse, utilize the
disks) during the prime generation; this gives the random number
generator a better chance to gain enough entropy.
gpg: /tmp/newhome/trustdb.gpg: trustdb created
gpg: key 0832341A4A0CE664 marked as ultimately trusted
gpg: directory '/tmp/newhome/openpgp-revocs.d' created
gpg: revocation certificate stored as '/tmp/newhome/openpgp-revocs.d/C64E6636890D8BAACA843AFC0832341A4A0CE664.rev'
public and secret key created and signed.
pub rsa3072 2020-10-24 [SC] [expires: 2022-10-24]
C64E6636890D8BAACA843AFC0832341A4A0CE664
uid Gaius Baltar <baltar@defense.caprica.kob>
sub rsa3072 2020-10-24 [E] [expires: 2022-10-24]
gaius@baseship:/tmp$ gpg --homedir newhome --edit-key baltar
gpg: WARNING: unsafe permissions on homedir '/tmp/newhome'
gpg (GnuPG) 2.2.4; Copyright (C) 2017 Free Software Foundation, Inc.
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Secret key is available.
gpg: checking the trustdb
gpg: marginals needed: 3 completes needed: 1 trust model: pgp
gpg: depth: 0 valid: 1 signed: 0 trust: 0-, 0q, 0n, 0m, 0f, 1u
gpg: next trustdb check due at 2022-10-24
sec rsa3072/0832341A4A0CE664
created: 2020-10-24 expires: 2022-10-24 usage: SC
trust: ultimate validity: ultimate
ssb rsa3072/89519D45B7AECE4D
created: 2020-10-24 expires: 2022-10-24 usage: E
[ultimate] (1). Gaius Baltar <baltar@defense.caprica.kob>
gpg> showpref
[ultimate] (1). Gaius Baltar <baltar@defense.caprica.kob>
Cipher: AES256, AES192, AES, 3DES
Digest: SHA512, SHA384, SHA256, SHA224, SHA1
Compression: ZLIB, BZIP2, ZIP, Uncompressed
Features: MDC, Keyserver no-modify
gpg>
I don't know that the hash preferences in the public key are relevant to the issue talked about in the linked email thread. Currently GPG (and Sequoia) will reject the signing of keys using SHA1 if they claim to be recent.
The preferences would allow you to send me two messages that used the same SHA1 hash for the signature. I would never know that it had happened as I would never see the hash, only the valid signature. Both signatures would be logically valid as they came from your assertive action.
Au contraire, this is a good example of how well the PGP/GPG ecosystem deals with the realities of a long term cryptography standard. People with SHA1 hashes in their signatures have a reasonable way to transition away from those signatures while maintaining the reputation of those identities. There really isn't anything out there as good.
I should also point out that that the attack this is addressing (Shambles) has close to no practical implications. So this is a good example of how the PGP/GPG ecosystem takes security breakage very seriously.
I don't think git isn't using SHA1 for security though (outside of potentially when signing commits, but that's generally done with gpg, which you drew a distinction with). Assuming you're talking about commit hashes, I'm not particularly worried about the threat of a someone crafting a commit with a malicious hash to my repo because I have much bigger issues than commit hash duplicates if someone malicious is able to commit to my repo. Given that, SHA1 is just as safe to use for my commit hashes as it's always been; the fact that crafting collisions purposely has gotten easier doesn't make accidental collisions any more likely to happen.
How would hashing twice help with a collision? Surely if your hash function is A(C()) then if you find a collision in C() then that is also a collision in A(C())?
Also, just saying “we should use 100 year crypto” is about as useful as saying “we should hurry up and cure cancer”.
Nobody has a probably secure hashing function. It’s actually a huge problem in cryptography that a lot of the time hashing is modelled as a perfect random oracle and there are schemes which are secure only if modelled with a random oracle, but insecure when instantiated with any concrete hashing function.
Probably means (A(text) XOR C(text)) or something. Then you would need to find a collision in both algorithms for the same input and that will be arguably much harder.
I.e. just because someone demonstrates a lack of knowledge in his/her post, you can still show some flexibility for positive interpretation ;).
If you xor the hashes together, it may be as weak as the weaker of the two functions, though it probably depends on their structure.
Getting much stronger hashing with double compute time is easy. Double the number of round - most alorithms have about 30-60% security margins. It would probably avoid any breaks of the hash (so 80 bits for SHA1), though it can't be enough if the hash is simply too small (64 bit security for MD5/4/2).
Finding the two collisions are not independent of each other so its more like finding the collision in whichever function is stronger. (I.e. finding a collision in md5 + sha1 is roughly as hard as finding a collision in just sha1)
In the past (long enough ago that using md5 almost made sense), I used a combination of md5 and sha1 to verify file integrity. Both checksums had to match or the file was rejected.
It seems obvious that that's at least as strong as using sha1 by itself. I've always assumed it was significantly stronger than sha1 by itself, but you're saying it isn't.
Creating an md5 collision isn't terribly difficult these days, but I've never heard of an md5 collision that's also a sha1 collision.
Can you cite a source with more information on this?
(I'm not doubting what you say, I'd just like more information.)
Although looking over this again, there is some caveats in that if both hash functions suffer from shortcut attacks, you might not be able to combine both shortcut attacks depending on the details of the shortcut attack.
Regardless, in the sha1+md5 case, even if you cant combine both shortcut attacks (big if. Not sure if that's true or not), the complexity of the sha1 collision is roughly the same as a birthday attack on md5, so i think you still end up with close to same big-oh as if you could combine both shortcut attacks.
We're remarkably close--ENIAC was first powered up in 1946. Kinda cool when you think about it--electronic computers are WWII-era technology and a lot of us reading this now will see their 100-year anniversary.
> But as far as I know, any two different hashing schemes applied sequentially has never been broken.
Suppose you used SHA256(MD5(x)) as your sequential hashing function.
If I can find two messages m1, and m2 that have the same MD5 digest, the outer SHA256 will also collide. MD5 is sufficiently broken that finding such messages is easy: https://github.com/thereal1024/python-md5-collision
Two different hashing schemes applied in parallel is better. However even there many uses of hashing functions require all the bits in the output digest to be equally unpredictable.
Two hashing schemes in parallel has problems too. Suppose your parallel hash is SHA3+Ident, where Ident is either the first 100 bytes of the message or the message padded with NUL up to 100 bytes. Because Ident has no preimage resistance, the combo SHA3+Ident has very poor preimage resistance too.
By "in parallel" I don't mean XOR. I mean concatenation. So in my example sha256+md5 would concatenate sha256's 256 bit digest, and md5's 128 bit digest, to form a 384 bit digest.
The intuitive explanation for why XOR instead of concatenation doesn't work, is it lets the output of the broken hash function influence the output of the remaining good hash function. That is obviously cause for concern!
> The intuitive explanation for why XOR instead of concatenation doesn't work, is it lets the output of the broken hash function influence the output of the remaining good hash function. That is obviously cause for concern!
The bits of a good hash function should be random. XORing with some unrelated-ish thing should be no problem. So that's not enough of an explanation.
I still don't understand what you're claiming is wrong with XOR. As I understand it, the XOR of two secure hashes should be at least as strong as each one. Of course you can find ways to deliberately come up with a broken scheme if you really want to do something spectacularly dumb (like XORing a variation of the same algorithm with itself...), at which point the problem isn't really XOR, but assuming you're being reasonable and showing some sense in the process, the likelihood of someone somehow finding an exploitable correlation between them without breaking either one should be astronomically small to not even warrant a second thought.
Just as a mental exercise (im no expert) how about. GoodHash=HF(input) + HF(reversed-input)
Your final hash value will still be twice as bjg yes, but reversing the input should yield a different result ? This is of course for non-crypto usage.
If one of your hashes is iterated, you also have the multicollision problem: once you find a collision in your weak, iterated hash, you can quickly produce more collisions, so that the strength of your construction just devolves to that of the stronger hash.
I believe he is responding to "Two different hashing schemes applied in parallel is better."
My understanding is if you have a collision in the stronger function, you have an infinite family of collisions in the stronger function due to the length extension attack, from there you should be able to base the collision for the weaker function on one of the stronger function's (length extended) collisions.
This is just as weak as C. Any collision in C is by definition a collision in A(C(data).
You probably meant A(data) ^ C(data) [XOR] which is mostly as weak as the stronger of A and C. Google multicollisions in iterated hash functions for more details.
Hashing twice doesn't actually help in many cases that hashes are used. And I don't think there is a such thing as a "provably secure hashing algorithm", although I would very much be interested in learning about the topic if you have a reference :)
If the point of using two hashes is resistance against people breaking one of the hashing algorithms used, you’re always going to have to pick more than one algorithm to get that benefit. However, I don’t think nesting the calculations is actually necessary: Blake2(msg)+sha512(msg) would give you the same protection.
> Therefore, why not get the Americans to develop their best hashing algorithm, the Chinese to develop their best algorithm, and then use A(C(data)).
Would it not be better to report both A(data) and C(data) (or maybe all three), so you have to simultaneously find a collision in both? Using A(C(data)), finding a collision in C(data) gives a collision in A(C(data)) as well.
Further, there are other considerations - it doesn't matter how secure the hash is if it's unusably slow.
Depends what you mean by "provably secure." Hashing algorithms are usually only practical because of what compromises they make to achieve "good enough."
- At the most basic level, (practical) hashing cannot map all inputs to the same number of outputs, since consistent output length is usually highly desirable. This means that for sufficiently long inputs you will get collisions. What determines how good a hash is is often how hard that is to do.
- If your inputs map 1:1 to outputs, rainbow tables become harder but you open yourself up to a number of other possible attacks starting with the obvious length analysis.
- Hashing also often has to be very performant, which leads to its own compromises, but in many applications (like message digests) throughput performance is given a lot of weight.
In the most formal way, I guess it's possible to have a provably secure hash for all possible attacks, but this would most likely require a 1:1 mapping between inputs and outputs. This would be impractical for a few reasons.
The output has to be indistinguishable from random, and colliding n bits needs to take an exponential amount of time.
Output length can be anything. As long as it's indistinguishable from random, the chance of collision will be exponentially rare, and you can truncate or concatenate to get the length you want.
You're correct on all points, though this kind of implementation (with variable-length output) is fairly rare AFAIK. Having a fixed-length output lets you reliably define for example a database field length.
I'd love to see a practical implementation of that though, if you know where to find that kind of project off-hand!
Ya, there are also speed considerations when it comes to hashing that are more important than total security. If your mobile phone catches on fire, or your bandwidth is cut by half the extra security isn't worth the trade.
Also go back to 1920 and design a public crypto system that can't be broken today.