NIST is announcing that SHA-1 should be phased out by Dec. 31, 2030

px43 · on Dec 15, 2022

Good job SHA-1, you had a good run and didn't get too broken before retirement, which is quite an accomplishment for anyone these days.

Does anyone in the field know if there's a SHA-4 competition around the corner, or is SHA-3 good enough? It would be interesting if one of the SHA-4 requirements was Zero Knowledge Proof friendlyness. MiMC and Poseidon are two of the big contenders in that area, and it would be really great to see a full NIST quality onslaught against them.

twiss · on Dec 15, 2022

Not only is SHA-3 good enough, for most applications SHA-2 is still good enough as well. So personally I would be surprised to see a new competition for a general-purpose hashing algorithm anytime soon.

SAI_Peregrinus · on Dec 15, 2022

Agreed. Blake3 is super promising as cryptographic hash function family due to its software performance (not sure if anyone has tried making hardware designs for it), but SHA2 hardware acceleration is extremely common and makes it good enough. And while SHA3 software performance is poor it's quite good in hardware, so as more chips accelerate it it'll become generally viable. So while Blake3 looks nice, there's not a whole ton of need for it right now, and I doubt it'll become SHA4 ever.

aliqot · on Dec 15, 2022

Sha3 is performant, but I'll always give it the stink eye because of NIST selecting the winner, then modifying their solution before standardization without sufficient explanation. Read the room, this is cryptography; one does not simply add mystery padding and rush off to the printing press.

https://www.schneier.com/blog/archives/2013/10/will_keccak_s...

https://en.wikipedia.org/wiki/SHA-3#Padding

WhitneyLand · on Dec 15, 2022

How much did your opinion change after reading the 10/5 update to Schneier’s post?

From your first link:

I misspoke when I wrote that NIST made “internal changes” to the algorithm. That was sloppy of me. The Keccak permutation remains unchanged. What NIST proposed was reducing the hash function’s capacity in the name of performance. One of Keccak’s nice features is that it’s highly tunable.

I do not believe that the NIST changes were suggested by the NSA. Nor do I believe that the changes make the algorithm easier to break by the NSA. I believe NIST made the changes in good faith, and the result is a better security/performance trade-off.

booi · on Dec 15, 2022

Nice try NSA, that’s exactly what you want us to believe!

jk I’m sure it’s fine… probably

macksd · on Dec 15, 2022

I'm sure it's fine if you use James Mickens' Mossad / Not-Mossad threat model.

a1369209993 · on Dec 15, 2022

Can't speak for aliqot, but I am now somewhat more confident that the NIST changes were suggested by the NSA, and slightly more confident (or at least less unconfident) that SHA-3 is insecure.

I still think it's probably fine, but I feel better about insisting on SHA-2+mitigations or blake3 instead now, even if the main problem with SHA-3 is its being deliberately designed to encourage specialized hardware accelleration (cf AES and things like Intel's aes-ni).

(To be clear, the fact that Schneier claims to "believe NIST made the changes in good faith" is weak but nonzero evidence that they did not. I don't see any concrete evidence for a backdoor, although you obviously shouldn't trust me either.)

andreareina · on Dec 16, 2022

Does Schneier have some association to the NSA I don't know about? I'd normally consider that statement as weak evidence they did make the changes in good faith.

a1369209993 · on Dec 23, 2022

A bit late myself, but to clarify:

Making late (in this case, after the competition was already over) changes to a crytographic primitive - without extensive documentation both of why that's necessary (not just helpful) and why it's not possible (not just you promise it doesn't) for that to weaken security or insert backdoors - is a act of either bad faith or sufficiently gross incompetence that it should be considered de facto bad faith.

Schneier claiming to believe that it's good faith implies that he either doesn't understand or (presumably more likely given his history?) doesn't care about keeping the standardization process secure against corruption by the NSA, which suggests either incompetence or bad faith on Schneier's part as well. (Or, in context, that someone was leaning on him after the earlier criticism.)

This is particularly inexcusable since reducing security parameters on the pretense that "56 bits ought to be enough for anybody" is a known NSA tactic dating back to fucking DES.

TedDoesntTalk · on Dec 16, 2022

> Does Schneier have some association to the NSA I don't know about?

Do you think he’d disclose it if he did?

andreareina · on Dec 16, 2022

What does that have to do with anything? AFAIK Schneier's public actions are consistently pro-privacy, pro-encryption. Implying otherwise should be accompanied by evidence don't you think?

Perseids · on Dec 15, 2022

It is fair to criticize NIST for enabling rumors about them weakening SHA3, but these are rumors only, nothing more. Please, everyone, stop spreading them. SHA3 is the same Keccak that has gone through extensive scrutiny during the SHA3 competition, and, as far as anyone can tell, it is rock solid. Don't trust NIST, don't trust me, trust the designers of Keccak, who tell you that NIST did not break it:

> NIST's current proposal for SHA-3, namely the one presented by John Kelsey at CHES 2013 in August, is a subset of the Keccak family. More concretely, one can generate the test vectors for that proposal using the Keccak reference code (version 3.0 and later, January 2011). This alone shows that the proposal cannot contain internal changes to the algorithm.

https://keccak.team/2013/yes_this_is_keccak.html

yarg · on Dec 15, 2022

Skein always seemed interesting to me: https://en.wikipedia.org/wiki/Skein_(hash_function)

But I've had a soft spot for Schneier since this (pre-Snowden) blog post on Dual_EC_DRBG: https://www.schneier.com/blog/archives/2007/11/the_strange_s....

pilsetnieks · on Dec 15, 2022

> Read the room, this is cryptography; one does not simply add mystery padding and rush off to the printing press.

In fact, one actually has done almost exactly that: https://en.wikipedia.org/wiki/Data_Encryption_Standard#NSA's...

dchest · on Dec 15, 2022

All good changes, giving us SHAKE functions and the rest of useful variants based on domain separation in padding.

15155 · on Dec 16, 2022

I've implemented SHA-3 and Keccak in hardware (FPGA) and software (CPU, GPU) countless times: there's zero scenario where this single byte change occurring before 24 rounds of massive permutation has any measurable effect on the security of this hash function.

nicolashahn · on Dec 15, 2022

how else do you add a backdoor?

Dylan16807 · on Dec 15, 2022

The original padding, as bits, was 100000...0001

The new padding prepends either 01, 11, or 1111 depending on the variant of SHA-3. That way the different variants don't sometimes give the same [partial] hash.

It was weird to toss that in at the last second but there's no room for a backdoor there.

KMag · on Dec 15, 2022

As far as Blake3 in hardware for anything other than very low-power smart cards or similar:

Blake3 was designed from the ground up to be highly optimized for vector instructions operating on four vectors, each of 4 32-bit words. If you already have the usual 4x32 vector operations, plus a vector permute (to transform the operations across the diagonal of your 4x4 matrix into operations down columns) and the usual bypass network to reduce latency, I think it would rarely be worth the transistor budget to create dedicated Blake3 (or Blake2s/b) instructions.

In contrast, SHA-3's state is conceptually five vectors each of five 32-bit words, which doesn't map as neatly onto most vector ISAs. As I remember, it has column and row operations rather than column and diagonal operations that parallelize better on vector hardware.

SHA-2 is a Merkle-Damgard construction where the round function is a Davies-Meyer construction where the internal block cipher is a highly unbalanced Feistel cipher. Conceptually, you have a queue of 8 words (32-bit or 64-bit, depending on which variant). Each operation pops the first word from the queue, combines it in a nonlinear way with 6 of the other words, adds one word from the "key schedule" derived from the message, and pushes the result on the back of the queue. The one word that wasn't otherwise used is increased by the sum of the round key and a non-linear function of 3 of the other words. As you might imagine, this doesn't map very well onto general-purpose vector instructions. This cipher is wrapped in a step (Davies-Meyer construction) where you save a copy of the state, encrypt the state using the next block of the message, and then add the saved copy to the encrypted result (making it non-invertible, making meet-in-the middle attacks much more difficult). The key schedule uses a variation on a lagged Fibonacci generator to expand each message block into a larger number of round keys.

oconnor663 · on Dec 16, 2022

> Blake3 was designed from the ground up to be highly optimized for vector instructions operating on four vectors, each of 4 32-bit words.

This is true, and the BLAKE family inherits this structure from ChaCha, but there's also more to it than that. If you have enough input to fill many blocks, you can run multiple blocks in parallel. In this situation, rather than dividing up the 16 words of a block into four vectors, you put each word in a different vector, and the words of each vector represent the same position in different blocks. (I.e rather than representing columns or rows, the vectors point "out of the page.) There are several benefits to this arrangement:

1. You don't need to do that diagonalization operation anymore.

2. If your CPU supports "instruction-level parallelism" for vector operation, working across the four words/vectors in a row gets to take advantage of that.

3. Best of all, you're no longer limited to 4-word vectors. If you have enough input to fill 8 blocks (AVX2) or 16 blocks (AVX-512), you can use those much larger instruction sets.

This is all easy to take advantage of in a stream cipher like ChaCha, because each block is independent. With a hash function, things are more complicated, because you usually have data dependencies between different blocks. That's why the tree structure of BLAKE3 (or somewhat similarly, KangarooTwelve) is so important for performance. It's not just about multithreading; it also about SIMD. See section 5.3 of the BLAKE3 paper for more on this.

silisili · on Dec 15, 2022

Random question - I see BLAKE written in caps. Is it an acronym of something?

detaro · on Dec 15, 2022

I don't think so? BLAKE is an evolution of an earlier proposal for a family of hash functions called LAKE, but the paper for that does not explain the name at least.

StreamBright · on Dec 15, 2022

> Blake3 is super promising as cryptographic hash function family due to its software performance

I am shocked how fast it is. Just tried it on several files. It is blazing fast.

brew install b3sum on mac.

whateveracct · on Dec 16, 2022

Oh yeah Blake3 is just the best if it's accessible to you. I get that other algorithms are more ubiquitous. But if I need a crypto hash, I consider Blake first.

15155 · on Dec 16, 2022

As a hardware developer: Blake3 is better than SHA1/2, but still far inferior to SHA-3 with regards to implementation area or ease.

dataflow · on Dec 15, 2022

> for most applications SHA-2 is still good enough as well.

For what applications is it not good enough, and how/why do we believe that?

twiss · on Dec 15, 2022

Applications that care about length extension attacks: https://en.wikipedia.org/wiki/Length_extension_attack (though most applications shouldn't care about it, and not all functions in the SHA-2 family are vulnerable to it, as tptacek notes. See https://en.wikipedia.org/wiki/SHA-2#Comparison_of_SHA_functi...).

tptacek · on Dec 15, 2022

There are essentially no applications where SHA2 isn't good enough.

(You could contort yourself into an argument that "SHA2 isn't good enough for protocols where you need a keyed hash without HMAC", but 1) that isn't true given SHA2-384 and 2) there really are no such protocols).

The SHA1 break doesn't threaten SHA2. The two hashes are different in a significant way that breaks the SHA1 attack.

cperciva · on Dec 16, 2022

The SHA1 break doesn't threaten SHA2. The two hashes are different in a significant way that breaks the SHA1 attack.

Hmm, I think you might be overstating this a bit. While the SHA1 attack indeed does not break SHA2, the whole reason SHA3 exists at all is that SHA1 and SHA2 are similar enough in their structure that we were worried that the methods used in the SHA1 break could be extended to attack SHA2.

So far the answer seems to be no, but it was a serious concern for a while.

tptacek · on Dec 16, 2022

How am I overstating this? (I'm asking seriously, and also working on a piece about this). For what it's worth: I'm relaying something Marc Stevens said, and a superficial read of the Shattered paper: that the key weakness in SHA1, not shared by SHA2, is a linear message schedule that makes it possible to find the differential paths the attack relies on.

Is it a serious concern among cryptographers outside of NIST?

late edit

I removed the scare quotes around "differential paths" after skimming the first Stevens paper and the Chabaud-Joux paper and confirming they were using "differential" they way I understand the term. :)

Also, I'm more confident that the message schedule is pretty central to the attack, and I guess the whole line of research that led up to it?

cperciva · on Dec 16, 2022

I guess it's a matter of how strongly you interpret the word "significant". There's absolutely a difference between SHA1 and SHA2 which results in the attack not working; but I'd personally characterize it as a minor tweak which turned out after the fact to have larger than anticipated benefits. I'd say that the difference between SHA2 and SHA3 is 32x larger than the difference between SHA1 and SHA2.

Is it a serious concern among cryptographers outside of NIST?

Today? Not really, because we've had years of research showing us that SHA2 still seems to be safe. But when the major breaks of SHA1 were happening? Absolutely. In my "everything you need to know about crypto in 1 hour" talk I explicitly said "use SHA2 but be ready to move to SHA3 if needed because the attacks on SHA1 are scary and we're worried they could generalize to SHA2 as well".

tptacek · on Dec 16, 2022

I can certainly see why you'd say SHA3 is 32x more different than SHA2 than SHA2 is from SHA1! SHA2 is closely related to SHA1, and SHA3 isn't. Like, I get what SHA3 is. :)

As for your points about anticipated benefits of the SHA2 message schedule: isn't part of the point of SHA2 that it's more ARX-y? Which is basically what makes the Shattered line of attacks not viable?

I think we're in the same place in terms of recommendations! Except: I don't know that "the SHA1 attacks could generalize" is all that valid a concern? Regardless, the "why" of this is super interesting and I don't think anyone has broken it down super clearly (I'm not doing it by myself; I don't have the chops).

cperciva · on Dec 16, 2022

isn't part of the point of SHA2 that it's more ARX-y?

I don't know that the NSA has ever published the internal discussions which resulted in SHA2, but I always thought the primary design purpose of SHA2 was to produce larger hashes (and thus avoid the 2^80 birthday attack on SHA1).

Except: I don't know that "the SHA1 attacks could generalize" is all that valid a concern?

It's not, now. It was in 2005/2006. Remember there wasn't just one SHA1 attack; there was a whole series of them. (And I'm guessing the NSA was particularly concerned given that some of the attacks were discovered by Chinese researchers.)

tptacek · on Dec 16, 2022

I lost 30 minutes trying to track down the same thing (any kind of official rationale for SHA2, or even contemporaneous public comment for 180-2) and yeah, my understanding as well is that the high-level design goal was hashes that had parity with AES.

dataflow · on Dec 15, 2022

Yeah that was my impression too. So why the urge to move on to SHA3 (and now the discussion around SHA4)?

twiss · on Dec 15, 2022

I don't think there's an urge to move to SHA-3, certainly I didn't mean to imply that, rather the opposite. I don't think there's much discussion about SHA-4 either, outside this particular comment thread :)

iIiketrains · on Dec 15, 2022

There are bitcoin asics that basically SHA-256 equiv of EFF DES cracker, though they are dSHA variant but I think it can be modified to run only half the rounds.

SilasX · on Dec 15, 2022

Isn't the entire Bitcoin mining incentive structure equivalent to a huge bug bounty on SHA-2 (256, specifically)?

Edit: Never mind, grand-uncle comment beat me to it:

https://news.ycombinator.com/item?id=34003485

fstanis · on Dec 16, 2022

IIUC, SHA-3 isn't actually better in any known way, it's just meant to be a vastly different algorithm so we'd have an alternative to the other ones.

15155 · on Dec 16, 2022

SHA-3 doesn't use addition: from a hardware implementation standpoint this is a huge, huge advantage because there's no carry propagation delay.

SHA-3 is by far and away the most elegant modern cryptographic hash algorithm.

LawTalkingGuy · on Dec 16, 2022

It's conceptually a lot different not just internally but in that it's equivalent to an automatically finalized hash and could eliminate a class of accidental misuse.

Taek · on Dec 16, 2022

sha3 is a lot more power efficient when implemented in hardware, which is nice when you have an IoT device that needs to do a lot of hashing.

It will also be really nice on amd_64 CPUs for doing large amounts of hashing if they ever get a sha3 instruction.

That said, hashing is rarely your bottleneck so there's not much urgency.

pclmulqdq · on Dec 15, 2022

The NIST likes having two standardized "good" hash functions. I doubt we will have a SHA-4 until SHA-2 starts getting close to a break. The SHA-3 competition started once SHA-1 started showing some possible weaknesses. Also, if SHA-2 ends up getting retired, Blake3 will likely become SHA-4: Blake2 lost to Keccak because Blake2 was too similar to SHA-2.

twiss · on Dec 15, 2022

NIST seems perfectly comfortable with having only one cryptographic primitive for a given category as long as there's high confidence in it. The reason we have two hash algorithms is that back when the SHA-3 competition was created, there was some uncertainty about the long-term security of SHA-2. That uncertainty has since subsided, I would say. But if SHA-2 does end up being considered insecure, as long as there's no reason to suspect that the same will happen to SHA-3, there's no reason to create a SHA-4 yet.

als0 · on Dec 15, 2022

> It would be interesting if one of the SHA-4 requirements was Zero Knowledge Proof friendlyness

What would this requirement look like and why would it be important for a hash function?

miga · on Dec 15, 2022

There is Post-Quantum Cryptography competition, with three e-signature algorithms selected for standardization: https://csrc.nist.gov/projects/post-quantum-cryptography

Given that NIST itself warns PQC algorithm may be unsafe after 2035, this should be considered SHA-4.

bawolff · on Dec 15, 2022

Sha-2 is already quantum safe.

Sha is not a digital signature algorithm. That is a different type of crypto primitive.

ShredKazoo · on Dec 16, 2022

>Given that NIST itself warns PQC algorithm may be unsafe after 2035

I thought post-quantum cryptography was supposed to be futureproof? Or am I misunderstanding

adastra22 · on Dec 15, 2022

Hash functions aren’t significantly impacted by quantum computers. You may need to use a longer construction (eg. SHA512 instead of SHA256), but that’s it.

bawolff · on Dec 15, 2022

Im not a cryptographer, im kind of curious - is it possible to combine a birthday attack with grover's algorithm to attack sha256 in 2^64 time?

adastra22 · on Dec 15, 2022

You would get cube root speedup instead of sqrt for a collision (birthday attack), or sqrt instead of brute force for a preimage. So SHA256 is secure from preimage attacks even with a quantum computer, and gives 2^80 protection against collisions. SHA-2/384 would be sufficient for 128-bit security against collisions.

jimrandomh · on Dec 15, 2022

SHA-256 is used for Bitcoin mining, which serves as an enormous bug bounty for both full and partial breaks (if you can efficiently find inputs where the output has lots of leading zeroes that's a partial break, and lets you mine bitcoin more efficiently). That's worth a lot of trust. I don't see any particular reason to think SHA-3 is better (though I'm not an expert) and unless I hear some indication that a problem has been found with SHA-2, I'll probably stick with it forever.

pclmulqdq · on Dec 15, 2022

Bitcoin mining uses double SHA-256. It tends to be harder to break doubled hash functions, since you don't have tools like length-extension attacks on the second round of hashing. For example, HMAC-SHA-1 is still secure (despite SHA-1 being pretty much broken), and it also uses a two-round hashing construction.

flatiron · on Dec 15, 2022

Even if it didn’t sha as a hmac is still fine because the length it takes to break. You can’t break sha1 in the time it takes for your packet to go through the hmac step.

anderskaseorg · on Dec 15, 2022

SHA-256 is vulnerable to a length extension attack. This is well-known to cryptographers; it doesn’t matter for some applications and can be worked around in others. But it still catches some developers unaware.

https://en.wikipedia.org/wiki/Length_extension_attack

tptacek · on Dec 15, 2022

There are increasingly few situations in which length extension really matters, because we know more about designing protocols than we did 20 years ago; even code that uses SHA3 or Blake2 tends to still use HMAC. Further, there are standard variants of SHA2 that don't have length extension (they're truncated and don't disclose their full state). It's better to not have length extension properties than to have them, but it's not really a big part of the desiderata.

flatiron · on Dec 15, 2022

I’m always confused when people slam sha on hmac. Is there a realtime sha back door I’m missing? Even sha 1 takes days/months to break.

tptacek · on Dec 15, 2022

I don't understand what you're asking here. Don't use HMAC as a password hash? HMAC keys should be a long string of uncorrelated bits, not ASCII strings?

AtNightWeCode · on Dec 15, 2022

It is often worked around. But from what I understand the length extension issue was raised during the design but ignored. Hashing is not encryption but some of the recommended encryption standards are so complicated that it is a risk just by itself.

SHA-384 is a truncated SHA-512. From the claims of sec people it does not offer more security when it comes to length attacks. But from how the algo works I would assume that it does.

Nist is also plain wrong about their calculations. Cause how long it takes to calculate a specific hash depends on the hardware available, not what theory books says. It may in practice be faster to calculate a hash with more bits.

adastra22 · on Dec 15, 2022

Bitcoin mining uses double SHA256 which is not subject to length extension attacks. (Not that it matters, because bitcoin block headers are fixed length.)

phh · on Dec 15, 2022

I didn't expect such attacks to exist, thanks for bringing that up. However that Wikipedia page seem to say sha-256 is ok since it's truncated?

Dylan16807 · on Dec 15, 2022

SHA-256 is not truncated. "SHA-512/256" is truncated. It means you do SHA-512 (with a different starting state) and then throw out half.

CBLT · on Dec 15, 2022

Sha-224 and Sha-384 are the truncated versions of Sha-256 and Sha-512 respectively.

My boring hash function of choice is Sha-384. The Sha-512 computation is faster on Intel hardware, and ASICS to crack it are far more expensive than Sha-256 because of bitcoin.

If you're hashing passwords or something, use a "harder" hash like Argon2 or Scrypt.

adrian_b · on Dec 15, 2022

SHA-512 is faster only on Skylake derivatives up to Comet Lake and on older Intel Core CPUs.

On Intel Atom starting with Apollo Lake (2016) and on Intel Core starting with Ice Lake (2019) and on all AMD Zen CPUs (2017), SHA-256 is implemented in hardware and it is much faster than SHA-512.

mindcandy · on Dec 15, 2022

In my research into Ethereum, I learned that in the creation of SHA-3, they did a lot of hammering on SHA-256 to see if new weaknesses could be discovered and addressed. The conclusion was that SHA-256 is still solid as far as anyone can tell. The SHA-3 process moved forward anyway so they could have a backup-plan handy in case some problem with SHA-256 pops up out of nowhere.

tromp · on Dec 15, 2022

> if you can efficiently find inputs where the output has lots of leading zeroes that's a partial break, and lets you mine bitcoin more efficiently

Depends on the type of break. If the break only allows finding a hash with 128+k leading zeroes in 2^{128+k/2} time, that would still be quite useless for bitcoin mining.

The break would have to cover the bitcoin regime of around 78 leading 0s.

dreamcompiler · on Dec 15, 2022

SHA-1 is still perfectly fine for some applications like detecting duplicate files on a storage medium (and it's less likely to produce a false positive than MD5) but it's been a bad idea for anything security related for a decade.

The biggest issue is that git still uses it, which presents a problem if you want to protect a repo from active integrity attacks.

dwheeler · on Dec 15, 2022

Git no longer uses SHA-1. It instead uses a variant called SHA-1DC that detects some known problems, and in those cases returns a different answer. More info: <https://github.com/cr-marcstevens/sha1collisiondetection>. Git switched to SHA-1DC in its version 2.13 release in 2017. It's a decent stopgap but not a grrat long term solution.

There is also work to support SHA-256, though that seems to have stalled: https://lwn.net/Articles/898522/

The fundamental problem is that get developers assumed that hash algorithms would never be changed, and that was a ridiculous assumption. It's much wiser to implement crypto agility.

woodruffw · on Dec 15, 2022

> The fundamental problem is that get developers assumed that hash algorithms would never be changed, and that was a ridiculous assumption. It's much wiser to implement crypto agility.

Cryptographic agility makes this problem worse, not better: instead of having a "flag day" (or release) where `git`'s digest choice reflects the State of the Art, agility ensures that every future version of `git` can be downgraded to a broken digest.

est31 · on Dec 15, 2022

That's the general anti-agility argument wielded against git, but note that git's use cases require it to process historic data.

E.g. you will want to be able to read some sha-1-only repo from disk that was last touched a decade ago. That's a different thing than some protocol which requires both parties to be on-line, say wireguard, in which instance it's easier to switch both to a new version that uses a different cryptographic algorithm.

Git has such protocols as well, and maybe it can deprecate sha-1 support there eventually, but even there it has to support both sha-1 and sha-2 for a while because not everyone is using the latest and greatest version of git, and no sysadmin wants the absolute horror of flag days.

miga · on Dec 15, 2022

It would be safer to forbid broken hashes after certain date, and consider only those earlier hashes that have been counter-signed by new algorithms.

lazide · on Dec 16, 2022

So then you can’t load an archived repo?

Dylan16807 · on Dec 15, 2022

Assuming reasonable logic around hashes, like "a SHA-2 commit can't be a parent of a SHA-1 commit", there wouldn't much in the way of downgrade attacks available.

bornfreddy · on Dec 15, 2022

Wow, smart! This would keep all the old history intact and at the same time force lots of people to upgrade through social pressure. I'd probably be angry as hell when that happened to me, but it would also work.

est31 · on Dec 16, 2022

FTR the current plan for git's migration is that commits have both SHA-1 and SHA-2 addresses, and you can reference them by both. There is thus no concept of "SHA-2 commit", or "SHA-1 commit". The issue is more around pointers that are not directly managed by git, e.g. hashes inside commit messages to reference an earlier commit (and of course signatures). Those might require a git filter-repo - like step that breaks the SHA-1 hashes (and signatures) to migrate to SHA-2, if that is desired.

AgentME · on Dec 15, 2022

SHA-1 was already known to be broken at the time Git chose it, but they chose it anyway. Choosing a non-broken algorithm like SHA-2 was an easy choice they could have made that would still hold up today. Implementing a crypto agility system is not without major trade-offs (consider how common downgrade attacks have been across protocols!).

groestl · on Dec 15, 2022

> Choosing a non-broken algorithm like SHA-2 was an easy choice they could have made that would still hold up today.

Yet, the requirement of the hashing algorithm for Git is not broken, it's not cryptographic but merely stochastic, and Linus knows this.

Why bother to produce a collision, when you have the power to get your changes pulled into a release branch? Your attack might be noticed, and your cover blown.

Instead, simply try to get a bug merged that results in a zero day. In case somebody discovers it, at least you have plausible deniability that it happened on accident.

fishywang · on Dec 15, 2022

>SHA-1 was already known to be broken at the time Git chose it

Please pardon my ignorance but could you elaborate on what time (e.g. the year) are you referring to?

LarryMullins · on Dec 15, 2022

Since about 2005, collision attacks against SHA-1 have been known. In 2005 Linus dismissed these concerns as impractical, writing:

    > The basic attack goes like this:
    >
    > - I construct two .c files with identical hashes.
    
    Ok, I have a better plan.
    
    - you learn to fly by flapping your arms fast enough
    - you then learn to pee burning gasoline
    - then, you fly around New York, setting everybody you see on fire, until
    people make you emperor.
    
    Sounds like a good plan, no?
    
    But perhaps slightly impractical.
    
    Now, let's go back to your plan. Why do you think your plan is any better
    than mine?

https://git.vger.kernel.narkive.com/9lgv36un/zooko-zooko-com...

chlorion · on Dec 16, 2022

This is a really good example of Torvalds toxic attitude and absolutely horrific attitude towards security. This is an occurring pattern unfortunately.

Git not being prepared for this is going to cost a lot of time and money for a very large amount of people, and it could have been trivially mitigated if security were taken seriously in the first place, and if Torvalds was mature enough to understand the he is not an expert on cryptography topics.

nighthawk454 · on Dec 15, 2022

I didn't know either. From Wikipedia [1], SHA-1 has been considered insecure to some degree since 2005. Following the citations, apparently it's been known since at least August 2004 [2] but maybe not demonstrated in SHA-1 until early 2005.

git's first release was in 2005, so I guess technically SHA-1 issues could've been known or suspected during development time.

More generously, it could've been somewhat simultaneous. It sounds like it was considered a state-sponsored level attack at the time, if collisions were even going to be possible. Don't know if the git devs knew this and intentionally chose it anyway, or just didn't know.

[1] https://en.wikipedia.org/wiki/SHA-1

[2] https://www.schneier.com/blog/archives/2005/02/cryptanalysis...

EDIT: sibling comment has evidence that Linus did in fact know about it and considered it an impractical vector at the time

https://git.vger.kernel.narkive.com/9lgv36un/zooko-zooko-com...

derbOac · on Dec 15, 2022

Why isn't Git using something else? Why go to the trouble of implementing something like that?

I don't mean that as some ridiculing criticism, I just am genuinely puzzled.

duskwuff · on Dec 15, 2022

> Why isn't Git using something else?

Because switching to a different hash algorithm would break compatibility with all existing Git clients and repositories.

chungy · on Dec 15, 2022

Changing out the hashing algorithm in Git is a very difficult thing to do.

cerved · on Dec 15, 2022

not within the git project so much as all the other code that depends on it

hejaodbsidndbd · on Dec 15, 2022

It’s not called SHA-1DC. It’s called “some blob of C whose behaviour is never described anywhere”.

xena · on Dec 15, 2022

If there's a readily-avaliable blob of C code that does the operation, then by definition it must be described somewhere. Maybe you should get ChatGPT to describe what it does.

amluto · on Dec 15, 2022

> SHA-1 is still perfectly fine for some applications like detecting duplicate files on a storage medium

If by “perfectly fine” you mean “subject to attacks that generate somewhat targeted collisions that are practical enough that people do them for amusement and excuses to write blog posts and cute Twitter threads”, then maybe I agree.

Snark aside, SHA-1 is not fine for deduplication in any context where an attacker could control any inputs. Do not use it for new designs. Try to get rid of it in old designs.

orev · on Dec 15, 2022

By “perfectly fine” they mean detecting duplicate image or document files on your local storage, which it’s still perfectly fine for, and a frequent mode of usage for these types of tools.

Not every tool needs to be completely resilient to an entire Internets’ worth of attacks.

bawolff · on Dec 15, 2022

Ask the webkit source code repo how using sha-1 for that purpose went.

It depends what you are doing, but deduplication where a collision means you loose data, seems like an inapropriate place for sha-1.

bornfreddy · on Dec 15, 2022

Webkit incident: https://itwire.com/business-it-news/security/fallout-webkit-...

cat_plus_plus · on Dec 17, 2022

Why would someone be so idiotic as to not compare actual data before deleting probable extra copies?

dagenix · on Dec 15, 2022

Why would you pick SHA1 which has significant attacks you need to consider as opposed to something like SHA2 that doesn't?

jcrawfordor · on Dec 15, 2022

Deduplication is the kind of application where CRC is a decent approach and CRC has no resistance to attack whatsoever. SHA1 adds the advantage of lower natural collision probability while still being extremely fast. It's important to understand that not all applications of hashing are cryptographic or security applications, but that the high degree of optimization put into cryptographic algorithms often makes them a convenient choice in these situations.

These types of applications are usually using a cryptographic hash as one of a set of comparison functions that often start with file size as an optimization and might even include perceptual methods that are intentionally likely to produce collisions. Some will perform a byte-by-byte comparison as a final test, although just from a performance perspective this probably isn't worth the marginal improvement even for hash functions in which collisions are known to occur but vanishingly rare in organic data sets (this would include for example MD5 or even CRC at long bit lengths, but the lack of mixing in CRC makes organic collisions much more common with structured data).

SHA2 is significantly slower than SHA1 on many real platforms, so given that intentional collisions are not really part of the problem space few users would opt for the "upgrade" to SHA2. SHA1 itself isn't really a great choice because there are faster options with similar resistance to accidental collisions and worse resistance to intentional ones, but they're a lot less commonly known than the major cryptographic algorithms. Much of the literature on them is in the context of data structures and caching so the bit-lengths tend to be relatively small in that more collision-tolerant application and it's not always super clear how well they will perform at longer bit lengths (when capable).

Another way to consider this is from a threat modeling perspective: in a common file deduplication operation, when files come from non-trusted sources, someone might be able to exploit a second-preimage attack to generate a file that the deduplication tool will errantly consider a duplicate with another file, possibly resulting in one of the two being deleted if the tool takes automatic action. SHA1 actually remains highly resistant to preimage and second preimage attacks, so it's not likely that this is even feasible. SHA1 does have known collision attacks but these are unlikely to have any ramifications on a file deduplication system since both files would have to be generated by the adversary - that is, they can't modify the organic data set that they did not produce. I'm sure you could come up with an attack scenario that's feasible with SHA1 but I don't think it's one that would occur in reality. In any case, these types of tools are not generally being presented as resistant to malicious inputs.

If you're working in this problem space, a good thing to consider is hashing only subsets of the file contents, from multiple offsets to avoid collisions induced by structured parts of the format. This avoids the need to read in the entire file for the initial hash-matching heuristic. Some commercial tools initially perform comparisons on only the beginning of the file (e.g. first MB) but for some types of files this is going to be a lot more collision prone than if you incorporate samples from regular intervals, e.g. skipping over every so many storage blocks.

dylan604 · on Dec 15, 2022

who is attacking you in this situation though? you're scanning the files on your local system and storing their hashes. you then look for duplicate hashes, and compare the files that created them. if the files are truly duplicates, you can now decide what to do about that. if they are not truly the same, then you claim to have found another case of collisions, write your blog post/twitthread and move on, but does that constitute being attacked?

sometimes, i really feel like people in crypto just can't detach themselves enough to see that just because they have a hammer, not everything in the world is a nail.

dagenix · on Dec 15, 2022

Why would you pick a function that is known to have issues when there are other functions that do the same thing but don't have known issues?

Your comparison is flawed. It's more like if you have a nail and next to it a workbench with two hammers - a good hammer and a not as good hammer. This isn't a hard choice. But for reasons that are unclear to me, people in this thread are insisting on picking the less good hammer and rationalizing why for this specific nail it isn't all that much worse. Just pick the better hammer!

oefrha · on Dec 15, 2022

Because people already have two decades of SHA-1 hashes in their database and a rewrite + rescan is completely pointless? Hell, I have such a system using md5. So you produced a hash collision, cool, now fool my follow-on byte-by-byte comparison.

Edit: Before anyone lecture me on SHA-1 being slow, yes, I use BLAKE2 for new projects.

VLM · on Dec 15, 2022

Its worship of the new. Nothing worth anything existed last week, so reimplementing an entire infrastructure is zero cost.

ShowalkKama · on Dec 15, 2022

because it's shorter, dealing with long hashes is annoying (I use md5, not sha1)

bawolff · on Dec 15, 2022

You could just discard half the sha256 hash. Using the first 16 bytes of sha256 is a lot more secure than using just md5, in which case you might as well just use crc32.

dylan604 · on Dec 15, 2022

thanks for the reply that did not answer the question asked.

dagenix · on Dec 15, 2022

Your question is irrelevant. If you don't care about security, SHA1 is a bad choice because there are faster hash functions out there. If you do care about security, SHA1 is a bad choice because it has known flaws and there exist other algorithms that don't. The only valid reason to use SHA1 is if there is a historical requirement to use it that you can't reasonably change.

Any analysis about how hard it is for an attacker to get a file on your local file system via a cloned got repo, cached file, email attachment, image download, shared drive, etc is just a distraction.

adrian_b · on Dec 15, 2022

You would be right, except that there are no faster hash functions (on all modern Intel, AMD and ARM CPUs, which have hardware SHA-1).

BLAKE 3 is faster only in wall clock time, on an otherwise idle computer, because it fully uses all CPU cores, but it does not do less work.

BLAKE 3 is preferable only when the computer does nothing else but hashing.

lazide · on Dec 16, 2022

Uh no.

On a modern intel CPU, one core of SHA1 does about 500MB/s worth of hashing. Blake3 on the same core is 1.5GB/s or faster.

oconnor663 · on Dec 16, 2022

On the ThinkPad I'm typing this on, the single-threaded BLAKE3 benchmarks hit 8.8 GB/s :)

bawolff · on Dec 15, 2022

I dont think this is what most people think of when they say deduplication. There are quite a few systems which will just scan for duplicates and then automatically delete one of the duplicates. In such a system sha1 would be inappropriate.

If you are just using sha1 as a heuristic you dont fully trust, i suppose sha1 is fine. It seems a bit of an odd choice though as something like MurmurHash would be much faster for such a use case.

dylan604 · on Dec 16, 2022

Most people. I haven't been part of that group, for like, ever maybe?

If we're a group of devs with a not insignificant percentage of those devs being frontend/UI/UX types, then having the same image in multiple sizes, formats, etc is going to be pretty common. Looking for multiples of the exact file is only going to reduce so much. Knowing you have a library of images with a source and then all of the derivatives is going to get you a lot less files as long as you know you have the source, then running image based sameness is much more beneficial. Sure, this is niche territory, but yeah, and, so?

Maybe there's someone new(-ish) that hasn't really had to deal with cleaning up thousands of images to this extent. One would hope the same image in its various forms within a dev's env would be similarly named, but that's not guaranteed. If we could depend on filenames, we wouldn't need hashing, right?

bawolff · on Dec 16, 2022

In some cases deduplication happens at the file system layer transparently without you even realizing it. E.g. there are tools like https://github.com/lakshmipathi/dduper

I agree that image editing workflows are a different use case more suited to perceptual hashes than cryptographic hashes.

adrian_b · on Dec 15, 2022

SHA-1 cannot be trusted only when there is a possibility that both files whose hashes are compared have been created by an attacker.

While such a scenario may be plausible for a public file repository, so SHA-1 is a bad choice for a version control system like GIT, there are a lot of applications where this is impossible, so it is fine to use SHA-1.

bawolff · on Dec 15, 2022

I'm not sure what scenarios there are where you have a possibility of the attacker creating 1 file but not both. Especially because the attacker doesn't need to fully control both files but could control only a prefix of one of them and still do the attack.

I also think working out all the possibilities is really hard, and using sha256 is really easy.

lokar · on Dec 15, 2022

If you are going to follow up with a full comparison you can use something much cheaper then sha-1

adrian_b · on Dec 15, 2022

SHA-1 is implemented in hardware in all modern CPUs and it is much faster than any alternatives (not all libraries use the hardware instructions, so many popular programs compute SHA-1 much more slowly than possible; OpenSSL is among the few that use the hardware).

When hashing hundreds of GB or many TB of data, the hash speed is important.

When there are no active attackers and even against certain kinds of active attacks, SHA-1 remains secure.

For example, if hashes of the files from a file system are stored separately, in a secure place inaccessible for attackers (or in the case of a file transfer the hashes are transferred separately, through a secure channel), an attacker cannot make a file modification that would not be detected by recomputing the hashes.

Even if SHA-1 remains secure against preimage attacks, it should normally be used only when there are no attackers, e.g. for detecting hardware errors a.k.a. bit rotting, or for detecting duplicate data in storage that could not be accessed by an attacker.

While BLAKE 3 (not BLAKE 2) can be much faster than SHA-1, all the extra speed is obtained by consuming proportionally more CPU resources (extra threads and SIMD). When the hashing is done in background, there is no gain by using BLAKE 3 instead of SHA-1, because the foreground tasks will be delayed by the time gained for hashing.

Only when a computer does only hashing, BLAKE 3 is the best choice, because the hash will be computed in a minimal time, by fully using all the CPU cores.

oconnor663 · on Dec 16, 2022

> all the extra speed is obtained by consuming proportionally more CPU resources (extra threads and SIMD)

If you know you have other threads that need to do work, then yes, multithreading BLAKE3 would just pointlessly compete with those other threads. But I don't think the same is true of SIMD. If your process/thread isn't using vector registers, it's not like some other thread can borrow them. They just sit idle. So if you can make use of them to speed up your own process, there's very little downside. AVX-512 downclocking is the most notable exception, and you'd need to benchmark your application to see whether / how much that hurts you. But I think in most other cases, any power draw penalty you pay for using SIMD is swamped by the race-to-idle upside. (I don't have much experience measuring power, though, and I'd be happy to get corrected by someone who knows more.)

dreamcompiler · on Dec 15, 2022

Because SHA1 is faster and that matters if you're hashing several TB of files?

But I don't know for sure that's the case.

amluto · on Dec 15, 2022

First the preamble:

DO NOT USE SHA-1 UNLESS IT’S FOR COMPATIBILITY. NO EXCUSES.

With that out of the way: SHA-1 is not even particularly fast. BLAKE2-family functions are faster. Quite a few modern hash functions are also parallelizable, and SHA-1 is not. If for some reason you need something faster than a fast modern hash, there are non-cryptographic hashes and checksums that are extraordinarily fast.

If you have several TB of files, and for some reason you use SHA-1 to dedupe them, and you later forget you did that and download one of the many pairs of amusing SHA-1 collisions, you will lose data. Stop making excuses.

VLM · on Dec 15, 2022

> there are non-cryptographic hashes and checksums that are extraordinarily fast.

Is it still true that CRC32 is only about twice as fast as SHA1?

Yeah I know the XX hashes are like 30 times faster than SHA1.

A lot depends on instruction set and processor choice.

Maybe another way to put it is I've always been impressed that on small systems SHA1 is enormously longer but only twice as slow as CRC32.

For a lot of interoperable-maxing non-security non-crypto tasks, CRC32 is not a bad choice, if its good enough for Ethernet, zmodem, and mpeg streams its good enough for my telemetry packets LOL. (IIRC iSCSI uses some variant different formulae)

adrian_b · on Dec 15, 2022

CRC32 is fine for kilobyte-sized data packets.

For files, it is useless. Even if that was expected, I have computed CRC32 for all the files on an SSD. Of course, I have found thousands of collisions.

Dylan16807 · on Dec 15, 2022

Birthday-style collisions don't matter for integrity checking.

32 bits is too small to do the entire job of duplicate detection, but if it's fast enough then you can add a more thorough second pass and still save time.

Pet_Ant · on Dec 15, 2022

> where an attacker could control any inputs

I believe the GP's point hinges on the word "attacker". If you aren't in a hostile space, like just your won file server and you are monitoring your own backups it's fine. I still use MD5s to version my own config files. For personal use in non-hostile environments these hashes are still perfectly fine.

hannob · on Dec 15, 2022

> SHA-1 is still perfectly fine for some applications like detecting duplicate files on a storage medium

That's what the developers of subversion thought, but they didn't anticipate that once colliding files were available people would commit them to SVN repos as test cases. And then everything broke: https://www.bleepingcomputer.com/news/security/sha1-collisio...

TheRealPomax · on Dec 15, 2022

That changes the parameters quite a bit though. For local digests, like image deduplication of your own content, on your own computers, sha-1 is still perfectly fine. Heck, even MD5 is still workable (although more prone to collide). Nowhere in that process is the internet, or "users" or anything else like that involved =)

You use digests to quickly detect potential collisions, then you verify each collision report, then you delete the actual duplicates. Human involvement still very much required because you're curating your own data.

dylan604 · on Dec 15, 2022

If we're talking specifically image deduplication, then a hash comparison is only going to find you exact matches. what about the image deduplication of trying to find alt versions of things like scaling, different codecs, etc?

if you want to dedupe images, some sort of phashing would be much better so that the actual image is considered vs just the specific bits to generate the image.

TheRealPomax · on Dec 15, 2022

Depends on the images. For photographs, a digest is enough. For "random images downloaded from the web", or when you're deduplicating lots of user's data, sure, you want data appropriate digests, like SIFT prints. But then we're back to "you had to bring network content back into this" =)

dreamcompiler · on Dec 15, 2022

Very true but if the hash matches the images are guaranteed to match too. That's my first pass when deduping my drives. My second pass is looking for "logically equivalent" images.

oconnore · on Dec 15, 2022

If you need a shorter hash just truncate a modern hash algorithm down to 160 or 128 bits. Obviously the standard lengths were chosen for a reason, but SHA2-256/160 or SHA2-256/128 are better hash functions than SHA1 or MD5, respectively. Blake2b/160 is even faster than SHA1!

(I suspect this would be a good compromise for git, since so much tooling assumes a 160 bit hash, and yet we don't want to continue using SHA1)

danbruc · on Dec 15, 2022

Just as a note, the primary reason for the truncated variants is not to get a shorter hash but to prevent extension attacks. For variants without truncation, the final hash is the entire internal state, therefore an attacker can calculated the hash of any message that starts with the original message and then has additional content without knowing the original message. Truncating the hash denies access to the complete internal state and makes this impossible.

cesarb · on Dec 15, 2022

Another way to prevent extension attacks is to make the internal state different whenever the current block is the last block, as done for instance in BLAKE3 (which has as an additional input on each block a set of flags, and one of the flags says "this is the last block").

GTP · on Dec 15, 2022

Or use Keccak (which is the permutation that was chosen for SHA3), which lets you pick the length that you need.

kzrdude · on Dec 15, 2022

Git has already implemented a solution based on SHA-2 with 256 bit output so that's unlikely to be changed for the time being. (But it has not really been launched in earnest, only as a preview feature.)

kiratp · on Dec 15, 2022

As an industry we need to get over this pattern of scoping down usage of something that has failed it’s prime directive. People still use MD5 in secure related things because it’s been allowed to stick around without huge deprecation warnings in libraries and tools.

SHA1 (and MD5) need to be treated the same way you would treat O(n^2) sorting in a code review for a PR written by a newbie.

0xbadcafebee · on Dec 15, 2022

If you're only using the hash for non-cryptographic applications, there are much faster hashes: https://github.com/Cyan4973/xxHash

SHA1 and MD5 are the most widely accessible, though, and I agree it's fine to use them if you don't care about security.

EGreg · on Dec 15, 2022

Actually SHA1-HMAC is fine still. Although I had this question: https://crypto.stackexchange.com/questions/102990/what-is-mo...

oconnore · on Dec 15, 2022

Since I'm on a roll with truncated hashes: SHA2-512/384, and SHA2-512/256 are not vulnerable to a Merkle-Damgard length extension attack.

Thus, a construct like hash(key + message) can be used similar to SHA3 [1]

[1]: https://keccak.team/keccak_strengths.html

classichasclass · on Dec 15, 2022

This is worth emphasizing since HOTP (and thus TOTP) are based on it.

bawolff · on Dec 15, 2022

But sha-1 is now both slow and insecure.

If you dont care about security, use a faster hash. If you care about security use sha256 (which is about the same speed anyways).

The only valid reason to still use it in non-security critical roles is backwards compat.

VLM · on Dec 15, 2022

“We recommend that anyone relying on SHA-1 for security migrate to SHA-2 or SHA-3 as soon as possible.” —Chris Celi, NIST computer scientist

The emphasis being on "for security"

I've also used SHA-1 over the years for binning and verifying file transfer success, none of those are security related.

Sometimes, if you make a great big pile of different systems, what's held in common across them can be weird, SHA-1 popped out of the list so we used it.

I'm well aware its possible to write or automate the writing of dedicated specialized "perfect" hashing algos to match the incoming data, to bin the data more perfectlyier, but sometimes its nice if wildly separate systems all bin incoming data the same highly predictable way thats "good enough" and "fast enough".

jraph · on Dec 15, 2022

> I've also used SHA-1 over the years for binning and verifying file transfer success, none of those are security related.

It could. If you want to verify that the file has not been tempered by someone, it is security related.

VLM · on Dec 17, 2022

Prob a poor choice of words on my part.

Verified as in "is this file completely transferred or not?"

non-security critical data, I just want a general idea if its valid or the file transfer failed half way thru or the thing sending it went bonkers and just sent trash to us.

Another funny file transfer use: Send me a file of data every hour. Is the non-crypto-hash new or the same old hash? If its the same old hash, those clowns sent me the same file twice, I'm supposed to get a new one. Yes I know I can dedupe "easily" but not as "easily" as sha-1. And some application layer software like MySQL can directly generate SHA1 as a function in the query. Its really quite handy sometimes!

LinuxBender · on Dec 15, 2022

SHA-1 is still perfectly fine for some applications like detecting duplicate files on a storage medium

Absolutely agree, especially when speed is a workable trade-off and accepting real world hash collisions are unlikely and perhaps an acceptable risk. For financial data, especially files not belonging to me I would have md5+sha1+sha256 checksums and maybe even GPG sign a manifest of the checksums ... because why not. For my own files md5 has always been sufficient. I have yet to run into a real world collision.

FWIW anyone using `rsync --checksum` is still using md5. Not that long ago I think 2014 it was using md4. I would be surprised if rsync started using anything beyond md5 any time soon. I would love to see all the checksum algorithms become CPU instruction sets.

    Optimizations:
        no SIMD-roll, no asm-roll, no openssl-crypto, asm-MD5
    Checksum list:
        md5 md4 none
    Compress list:
        zstd lz4 zlibx zlib none
    Daemon auth list:
        md5 md4

tptacek · on Dec 15, 2022

You're never going to _accidentally_ run into an MD5 collision.

pornel · on Dec 16, 2022

Even if git didn't have protection against the known attack, it's still safe in practice.

The SHA-1 collision attack can only work if you take a specially-crafted file from the attacker and commit it to your repository. The file needs to have a specific structure, and will contain binary data that looks like junk. It can't look like innocent source code. If you execute unintelligible binary blobs from strangers, you're in trouble anyway.

There is no preimage weakness in SHA-1, so nobody is able to change or inject new data to an arbitrary repo/commit that doesn't already contain their colliding file.

umvi · on Dec 15, 2022

What kind of attack against a git repo can be realistically carried out today?

kadoban · on Dec 15, 2022

Given a git repo with signed commits I can give you the same repo with some file(s) changed with the signatures intact.

Pretty sure you can anyway, I haven't thought deeply about the git file formats involved and such.

Note that this attack isn't _that_ serious. There's not a lot of cases where this would make sense.

AnssiH · on Dec 15, 2022

I don't think so, unless you utilize some not-yet-public vulnerability.

As far as I know, with current public SHA-1 vulnerabilities, you can create two new objects with the same hash (collision attack), but cannot create a second object that has the same hash as some already existing object (preimage attack).

kadoban · on Dec 15, 2022

My bad, yep you're right. So you could only either give 2 people different git repos that should be the same or I guess you could submit a collided file into a repo you can submit changes to (eg a public one that accepts PRs) and give someone else the other version.

Given the limitations, really not too practical.

dragontamer · on Dec 15, 2022

I feel like the properties of CRC make them superior for that task in most cases though. (CRC8, CRC16, CRC32 and CRC64, maybe CRC128 if anyone ever bothered going that far)

In particular, CRC guarantees detection on all bursts of the given length. CRC32 protects vs all bursts of length 32 bits.

IncRnd · on Dec 15, 2022

> I feel like the properties of CRC make them superior for that task in most cases though.

THIS IS FALSE. Please do not ever do this. Why not? For example, by controlling any four contiguous bytes in a file, the resultant 32bit CRC can be forced to take on any value. A CRC is meant to detect errors due to noise - not changes due to a malicious actor.

Program design should not be done based upon one's feelings. CRCs absolutely do not have the required properties to detect duplication or to preserve integrity of a stored file that an attacker can modify.

dragontamer · on Dec 15, 2022

> THIS IS FALSE. Please do not ever do this. Why not? For example, by controlling any four contiguous bytes in a file, the resultant 32bit CRC can be forced to take on any value.

And SHA1 is now broken like this, with collisions and so forth. Perhaps it's not as simple as just 4 bytes, but the ability to create collisions is forcing this retirement.

If adversarial collisions are an issue, then MD5 and SHA1 are fully obsolete now. If you don't care for an adversary, might as well use the cheaper, faster CRC check.

------

CRC is now more valid use case than SHA1. That's the point of this announcement.

IncRnd · on Dec 15, 2022

> CRC is now more valid use case than SHA1. That's the point of this announcement.

No. That isn't the point of this announcement. This announcement codifies a transition time for SHA-1 use in SBU and has nothing to do with a CRC.

Dylan16807 · on Dec 15, 2022

The point of the announcement is to give a timeline to start treating SHA1 as having no real security.

This is relevant to when you'd choose CRC, because it also has no real security.

IncRnd · on Dec 16, 2022

> The point of the announcement is to give a timeline to start treating SHA1 as having no real security.

That's also false. There is a large body of knowledge here that you aren't expressing in your comments. That leads me to see that you are unfamiliar with the purposes of hash functions and their utility in real world situations.

The announcement refers to the transition timeline to stop using SHA-1, preferring the SHA-2 and SHA-3 families. However, the recommendations for years from NIST have been not to use SHA-1. For example SP 800-131Ar2 discusses not to use SHA-1 for digital sig gen and that digital sig ver is only acceptable for legacy uses.

The recommendation would have been for years to not use SHA-1 at all, except for this carve-out to handle already stored data that uses SHA-1. The remaining use cases cover protocol use, such as TLS, where SHA-1 is used as a component in constructs and not solely as a primitive.

Dylan16807 · on Dec 16, 2022

> That leads me to see that you are unfamiliar with the purposes of hash functions and their utility in real world situations.

Don't post like this, please.

AlotOfReading · on Dec 15, 2022

There are only a few examples of anything larger than CRC64 being characterized and they're not very useful.

For the sake of the next person who has to maintain your code though, please choose algorithms that adequately communicate your intentions. Choose CRCs only if you need to detect random errors in a noisy channel with a small number of bits and use a length appropriate to the intended usage (i.e. almost certainly not CRC64).

dragontamer · on Dec 15, 2022

And when you choose SHA1, does it mean you understood that it's no longer secure? Or is it chosen because it was secure 20 years ago but the code is old and needs to be updated?

CRC says that you never intended security from the start. It's timeless, aimed to prevent burst errors and random errors.

--------

BTW, what is the guaranteed Hamming distance between SHA1? How good is SHA1 vs burst errors? What about random errors?

Because the Hamming distances of CRC have been calculated and analyzed. We actually can determine, to an exact level, how good CRC codes are.

dagenix · on Dec 15, 2022

If you are choosing between a CRC and SHA1, you probably need to reconsider your understanding of the problem you are trying to solve. Those algorithms solve different use cases.

dragontamer · on Dec 15, 2022

If you are choosing SHA1, now that it is retired, you probably should think harder about the problem in general.

CRC should be better for any error detection code issue. Faster to calculate, more studied guaranteed detection modes, and so forth.

SHA1 has no error detection studies. It's designed as a cryptographic hash, to look random. As it so happens, it is more efficient to use other algorithms and do better than random if you have a better idea of how your error looks like.

Real world errors are either random or bursty. CRC is designed for these cases. CRC detects the longest burst possible for it's bitsize.

AlotOfReading · on Dec 15, 2022

You shouldn't choose SHA-1, that's the point of this announcement. Seeing it indicates both that there was the potential for malicious input and that the code is old. The appropriate mitigation is to move to a secure hash, not CRCs. You may not know the bounds and distances exactly, but you know them probabilistically. Bit errors almost always map to a different hash.

The same is true of CRCs over a large enough input as an aside.

benlivengood · on Dec 15, 2022

https://eprint.iacr.org/2019/078 is a good overview of the statistical properties of SHA hashes.

Basically after ~10 rounds the output is always indistinguishable from randomness which means hamming distance is what you'd expect (about half the bits differ) between the hashes of any two bitstreams.

dragontamer · on Dec 15, 2022

CRC does better than random vs burst errors. And vs random errors is indistinguishable from any other methodology.

If you are only worried about random errors, might as well use chksum.

null0ranje · on Dec 15, 2022

> NIST is announcing that SHA-1 should be phased out by Dec. 31, 2030

Looks like it'll limp along for a while yet

dang · on Dec 15, 2022

We've updated the title with a representative sentence from the article which makes this clearer.

zachruss92 · on Dec 15, 2022

It’s about time haha. Once they found real world conflicts it needed to be phased out for anything security related. At least it isn’t very hard to move from SHA-1 to SHA-256.

px43 · on Dec 15, 2022

AKA, SHA-2 :-)

SAI_Peregrinus · on Dec 15, 2022

SHA-2 is an entire family of hash functions: SHA-228, SHA-256, SHA-384, SHA-512, SHA-512/224, and SHA-512/256.

miga · on Dec 15, 2022

That is date after which purchase of _new_ software with SHA-1 will be forbidden, which seems late given that it takes only two months on GPU to find signature collision with a chosen prefix.

Sounds like the deprecation schedule is too slow and unsafe.

snhly · on Dec 15, 2022

Other than legacy stuff, I think the main reason programmers still use it is the fact that '1' is easier for them to call to mind than '256', or whatever else. So when you're throwing a script together really quickly, you'll prefer to type SHA1 instead of anything else. At least that's how my brain works.

ComplexSystems · on Dec 15, 2022

Any estimate on how long SHA 2 will last?

tptacek · on Dec 15, 2022

Indefinitely. It's fine. The message expansion in SHA2 is totally different from that of SHA1 (hash message expansion is analogous to cipher key scheduling), and forecloses on the SHA1 attacks. JP Aumasson, part of the BLAKE team, has suggested SHA2 might never be broken; that's hyperbolic but gives a sense of how little research is on the horizon to threaten it.

The big issues with SHA2 are MD structure/length extension (which HMAC addresses, and you can also use the truncated versions; length extension matters pretty much exclusively if you're designing entire new protocols) and speed.

I'd reach for Blake2 right now instead of SHA2 (or SHA3) in a new design, but I wouldn't waste time replacing SHA2 in anything that already exists, or put a lot of effort into adding a dependency to a system that already had SHA2 just to get Blake2.

miga · on Dec 15, 2022

Then why does NIST warn that we might need PQC algorithms by 2035?

Or is this claim ignoring progress of quantum computing?

luhn · on Dec 15, 2022

Quantum computing threatens factorization and elliptic curves, i.e. RSA and ECDSA. Hash functions are considered relatively safe. The NIST's PQC standardization is focused on public key cryptography—I can't find any initiatives from them working with post-quantum hashing.

dchest · on Dec 15, 2022

Here's one:

"neither us nor our children will see a SHA-256 collision (let alone a SHA-512 collision)" -- JP Aumasson https://twitter.com/veorq/status/652100309599264768

woodruffw · on Dec 15, 2022

SHA-2 uses the same construction as SHA-1, but my understanding is that there are no practical collision (much less preimage) attacks against full-round SHA-2.

greggarious · on Dec 15, 2022

Nice, though it's scary many open source projects still just list an MD5 sum for the binaries.

yrro · on Dec 15, 2022

I wonder if Windows' Kerberos implementation will ever support the aes128-cts-hmac-sha256-128/aes256-cts-hmac-sha384-192 encryption types. It's been stuck on aes128-cts-hmac-sha1-96/aes256-cts-hmac-sha1-96 since Vista...

upofadown · on Dec 15, 2022

I wonder what NIST does for a case where SHA-1 is not used for any cryptographic properties? I recently ran into that for OpenPGP. The authenticated encryption mode uses SHA-1 because it was otherwise used in the standard but because of how things work only a non-cryptographically secure hash is required.

I have somewhat jovially suggested that the OpenPGP standard should just rename it if it turns out that the name becomes a problem...

twiss · on Dec 15, 2022

FWIW, the "crypto refresh" of the OpenPGP standard specifies the use of AEAD for authenticated encryption, replacing the SHA-1 "modification detection code". Even if the latter isn't broken, the former is more modern and more performant, and so retconning the use of SHA-1 for optics shouldn't be necessary once everyone moves to that :)

upofadown · on Dec 15, 2022

Even assuming that anything real comes out of the "crypto refresh" (it seems to be going the way of the last effort) it would still be irresponsible to promote the use of an entirely incompatible encryption mode for no real reason. The existing standard has been in use for over 20 years now. Even if there were some sort of security weakness discovered it would make more sense to try to resolve that issue in some sort of backwards compatible way. Since there is no weakness then the best course of action seems pretty clear and is really easy to do. The crypto refresh is currently attempting to add no less than three new authenticated encryption modes in addition to the existing one for a total of four. Each and every mode is entirely incompatible with the others. This strikes me as sort of nuts...

twiss · on Dec 15, 2022

Only one of them, OCB, is mandatory to implement. The other two are optional, but there may be performance or compliance reasons to use one of them. E.g., people who care about FIPS compliance may want to use GCM.

In any case, since there are feature flags and algorithm preferences to signal support for these, all of this is in fact backwards compatible. There's little risk that someone will accidentally use an AEAD mode to encrypt a message for a recipient that doesn't support it, since the recipient needs to signal support for it in their public key.

And, offering performance and security benefits for those that care to upgrade their implementations and keys is still a worthy goal, IMHO.

upofadown · on Dec 15, 2022

>The other two are optional,...

That just makes things worse. The OpenPGP standard covers the offline, static encryption case. You have an encrypted file or message. If your implementation has implemented the encryption method then you can decrypt it. If it doesn't than you can't. Contrast this with an online, dynamic method like TLS where you can negotiate a method at the time of encryption and decryption.

>In any case, since there are feature flags and algorithm preferences to signal support for these, all of this is in fact backwards compatible.

The OpenPGP preferences are not applicable to symmetric encryption. In that case the implementation has to guess what method will be supported by the decrypting implementation. In most cases it would make sense to use the most widely implemented method. That is always going to be the one that has been used since forever.

The OpenPGP preferences are included in the public key at key generation time. They reflect the capabilities of that implementation. The public key then has a life of it's own. It is quite normal to use that key to encrypt material for other implementations. Having optional methods, again, makes things much worse here. This is not a theoretical problem. I have already been involved in a difficult usability issue caused by an implementation producing one of these new incompatible encryption modes. The file was created on the same implementation as the public key. So the implementation saw that the reciepient supported all the same methods that it did. But the actual implementation that did the decryption did not support that mode. This sort of interoperability problem, even if it only happens from time to time, seriously impacts usability. That is the ultimate issue. Why are we making things harder for the users of these systems for no real reason?

twiss · on Dec 15, 2022

> The OpenPGP preferences are not applicable to symmetric encryption. In that case the implementation has to guess what method will be supported by the decrypting implementation. In most cases it would make sense to use the most widely implemented method. That is always going to be the one that has been used since forever.

I agree, if you don't know the capabilities of the implementation decrypting the message then it makes sense to be conservative, and wait until AEAD is widely implemented before using it.

> I have already been involved in a difficult usability issue caused by an implementation producing one of these new incompatible encryption modes.

I also think that generating AEAD encrypted OpenPGP messages today is irresponsible, since the crypto refresh is still a draft, not an RFC yet. Even if the decrypting implementation can read the message today, the draft could still change (although now that it's in Working Group Last Call it's unlikely), and then you'd have an even bigger problem.

But I think that's the fault of the implementation, not of the (proposed) standard. If we eventually want to have the security and performance benefits of AEAD, we have to specify it today (well, or yesterday, but that's a bit hard to change now ^.^).

upofadown · on Dec 15, 2022

>If we eventually want to have the security and performance benefits of AEAD,...

The thing is, there doesn't seem to be any security weaknesses with the existing AE method. I have looked hard. There also doesn't seem to be any need for the AD (associated data) part.

OCB seems to be the fastest out of all of them. If the proposal was just to add on OCB as an enhanced performance mode then I might be OK with that. Why make the people encrypting multi-TB files wait? I am mostly grumpy with the idea that we have to drop an existing well established standard for stuff like messaging and less extreme file sizes.

twiss · on Dec 15, 2022

> The thing is, there doesn't seem to be any security weaknesses with the existing AE method. I have looked hard.

Even if nobody found any concrete security issues, there might be compliance reasons not to use SHA-1, as indicated by the OP. It's easier to switch to something new than to endlessly explain to auditors that actually in this particular case the use of SHA-1 is probably fine. Also note that there's no security proof for the MDC, making it harder to convince such auditors.

> There also doesn't seem to be any need for the AD (associated data) part.

I personally disagree: https://gitlab.com/openpgp-wg/rfc4880bis/-/issues/145. The paper linked there explains why having AD would be useful. But I'll grant you that the crypto refresh doesn't make significant use of it yet, so it can't really be counted as an advantage for AEAD yet.

> OCB seems to be the fastest out of all of them. If the proposal was just to add on OCB as an enhanced performance mode then I might be OK with that. Why make the people encrypting multi-TB files wait? I am mostly grumpy with the idea that we have to drop an existing well established standard for stuff like messaging and less extreme file sizes.

I'm not sure I understand what the concrete difference would be between what you're proposing and what the crypto refresh does? It introduces a new mode and encourages its use, but doesn't disallow the use of the current mode. That being said, once OCB is widely deployed, why would you want to use CFB instead of OCB?

upofadown · on Dec 16, 2022

>Also note that there's no security proof for the MDC,...

I am not sure what aspect of the MDC you would want to prove. The security properties of such constructions are well understood at this point[1]. It is a simple scheme. It has been under scrutiny for 20+ years. It isn't really possible to prove a scheme secure in general. What happens in practice is that it turns out that an assumption is incorrect. The MDC requires few assumptions and is probably more secure than other more complex schemes.

>The paper linked there explains why having AD would be useful.

I don't find that very compelling. The scheme described talks about a false positive rate. Such a thing would not be acceptable in normal PGP usage. I am also not convinced that such a scheme is impossible without AD.

As already mentioned, I am concerned about the ethics of changing a long term standard for no good reason. The users do not deserve the wave of low level interoperability problems such a proposal would create. Usability is a serious issue for end to end encrypted messaging of all types and should be prioritized.

[1] https://link.springer.com/content/pdf/10.1007/3-540-44987-6_...

kzrdude · on Dec 15, 2022

Yes, they say "Modules that still use SHA-1 after 2030 will not be permitted for purchase by the federal government" so I wonder... does that mean products that can read 2022-era git repos (using sha-1) are they no longer allowed by that date, and so on? There must be exceptions by use case!

klabb3 · on Dec 15, 2022

Use is quite vague in this context, at least having a distinction between read/write would be more clear.

woodruffw · on Dec 15, 2022

I'm not sure I understand what you mean by PGP not requiring SHA-1's cryptographic properties. Do you mean that PGP's authenticated encryption mode only requires preimage resistance?

upofadown · on Dec 15, 2022

Not even that. It only requires that the hash is not reversible.

woodruffw · on Dec 15, 2022

Can you elaborate? This doesn't match my understanding of how "authenticated" encryption works in PGP (by which I assume you mean MDC, which is closer to integrity-without-identity than authentication).

upofadown · on Dec 15, 2022

For most PGP use, the MDC only serves as an integrity check [1]. That is the same for the proposed modes as well. In the case of symmetrical encryption it does in fact serve to authenticate the encrypted material based on the passphrase.

It does not use the popular combination of an encryption function acting more or less independently of a MAC (message authentication code). It uses a different method[2]. This seems to cause much confusion.

[1] https://articles.59.ca/doku.php?id=pgpfan:authenticated

[2] https://articles.59.ca/doku.php?id=pgpfan:mdc

Spooky23 · on Dec 15, 2022

If you’re in a regulated industry and required to run FIPS validated modules in FIPS mode, usually you lose access to system implementations of the removed algorithm.

anon223345 · on Dec 18, 2022

That’s hilarious… 2030?? SHA-1 can be broken stupid fast

charlieyu1 · on Dec 15, 2022

At least 20 years too late

jodrellblank · on Dec 15, 2022

Who named it "Secure Hash Algorithm" :|

Now the name will be annoyingly and misleadingly wrong forever, in a way that was totally predictable.

GTP · on Dec 15, 2022

That was the name of the standard, for which they had to pick a "Secure Hash Algorithm". And it's normal to have to update standards over time, regardless of the name. Which name would you have suggested for a standard defining a secure hash function? Usually it's better to not use fantasy names for standards, and "A Hash Function That Is Secure Now But Maybe Not Anymore In 10 Years" it's a bit too verbose in my opinion.

lapetitejort · on Dec 15, 2022

The same concept applies to telescopes. Every year a "Ridiculously Huge Telescope" gets announced. If I named them I'd start small: "Decently Sized Telescope"; "Mildly Safe Hashing Algorithm".

Dylan16807 · on Dec 15, 2022

Huge Telescope 1. Huge Telescope 2. Huge Telescope 3.

If we do the same thing as the hash names, I think it's fine. "SHA-1", specifically with the "1", doesn't cause problems.

itsTyrion · on Dec 20, 2022

Exactly. Look at video codecs. H.264 is named AVC - Advanced Video Codec. now we have VP9, HEVC(H.265), AV1 and VVC(H.266) being better.

Or AAC - Advanced Audio Codec. Fast forward a few years, Opus came out and you can reduce the bitrate by another 30-50% with similar performance

dolni · on Dec 15, 2022

Change "Secure" for "FIPS" and you have a decent name. FHA-1 -- FIPS hash algorithm, version 1.

GTP · on Dec 16, 2022

Yes, this could have worked. But unfortunately it's too late now: changing the name of the standard now would create confusion as well.