Hacker News new | past | comments | ask | show | jobs | submit login
BLAKE2: “Harder, Better, Faster, Stronger” Than MD5 (2014) (leastauthority.com)
40 points by 0xedb on Jan 23, 2022 | hide | past | favorite | 49 comments




Wow, totally unaware of the performance speed-up 5x over sha1 and blake2b. Even more for others!

Are there cases where the benchmark doesn't hold true? Different architectures?

I'm inclined to use this on some less security sensitive things I need a hash for.


> Are there cases where the benchmark doesn't hold true? Different architectures?

Yes, that 5x figure represents a big speedup that comes from using SIMD (specifically AVX-512) to compress multiple blocks in parallel. To take full advantage of AVX-512, BLAKE3 needs to have at least 16 KiB of input. So for any input shorter than that, the speedup compared to BLAKE2b/s will be smaller. For inputs less than 2 KiB, the only speedup is the ~1.4x that you get from the round reduction.

Architectures other than x86 tend to have less in the way of SIMD. ARM has NEON, but that's currently only 128 bits wide. So the big single-threaded speedups are only on x86 today. (For multithreading speedups, the architecture doesn't matter as much. Just the number of cores you have.)

> I'm inclined to use this on some less security sensitive things I need a hash for.

This makes sense insofar as BLAKE3 is a new design, and it pays for cryptographic applications to be conservative. But to be clear, if BLAKE3 turns out not to uphold the same security properties as SHA-2 or BLAKE2s, that would represent a catastrophic failure of the design. (It's possible that some flaw could be discovered that affects BLAKE2 and BLAKE3 equally, but that the lower round count of BLAKE3 would make the flaw more severe. In that case, I'd expect everyone would also want to migrate off BLAKE2 pretty quickly anyway.)


Interesting then that the latest Intel chips are dropping AVX-512.

Makes me wonder if BLAKE3 will be changed to suit, or if Intel chips are just going to suck when it come to performance with the standard?


I'm not sure any changes are necessary. The implementations will use AVX-512 where it's available, and they'll fall back to something else (often AVX-2) if it's not.


check out https://github.com/Cyan4973/xxHash for I use it for performance sensitive non-crypto stuff.


There shouldn't be any meaningful reduction in security afaik. Most of the performance comes from reducing the number of rounds used because it was shown that it was unnecessary iirc.


BLAKE3 only reduces the rounds to 7 from 10 in BLAKE2. That only accounts for a 1.42x speedup, not the > 5x speedup seen in BLAKE3 for a single thread. Most of the speedup comes from the hashing mode which breaks the input into a tree of independent chunks and then using SIMD to hash several of them in parallel. Additionally, the Rust implementation even allows even more parallelism using threads.


Still BLAKE3 still is not a ton faster for short data inputs especially if your implementation has a lot overhead for parallelism like threading like you mentioned in a rust implementation.

Good for files and similar if your data stream is long enough to effectively use the tree structure.


Gotcha, I wasn't sure if it was the parallelism or the reduced rounds that made the bigger difference.


Hoping to see a BLAKE3 implementation in whatever library is necessary for distros like Arch to verify package checksums soon.


Oh neat - I was doing some file matching on the weekend (check if files are the same by comparing size and SHA-256) and the file hashing certainly didn't help it's speed. This should give me a nice speed boost.


But isn't sha256 hardware accelerated on most modern intel CPU? Is it faster than a hardware accelerated sha256?


> isn't sha256 hardware accelerated on most modern intel CPU

No. Most Intel CPUs lack SHA-NI extensions. It looks like they may have finally started adding it to their 2020+ mainstream CPUs, but for a long time it was only present on their low-end Atom and similar CPUs. AMD CPUs have all had SHA-NI support since Zen.

https://en.wikipedia.org/wiki/Intel_SHA_extensions


Nah. It’s like chacha it’s only faster than aes on non equipped cpus. I still use sha2 everywhere.


The performance story here is more complicated than you might think. ChaCha and AES (in the typical AES-CTR or AES-GCM modes) have a lot in common structurally. In both cases, different blocks can be computed in parallel. Software and hardware implementations can both take advantage of that parallelism to improve performance.

The story is different with SHA-2, which is entirely serial by design. There's no possibility of parallelism between different blocks of a single SHA-2 input. That means that there's little incentive to create a parallelized hardware implementation of SHA-2, and I'm not aware of any. In contrast, BLAKE3 is more like AES-CTR and ChaCha in that (given a long enough input) it can take full advantage of any degree of parallelism. For this reason, given reasonably modern SIMD support like AVX-2, BLAKE3 tends to be faster than hardware-accelerated SHA-256, even before multithreading gets into the picture.


Well i'm sure there are usecases where it matters, in general sha-256 is already really fast. Why bother with something faster?


> in general sha-256 is already really fast. Why bother with something faster?

Have you never sat there waiting for a hash to finish computing? I regularly do and I'm not even dealing with large data. It's very slow, and worse than that it's both very slow and also entirely sequential! (So is BLAKE2 - BLAKE3 is parallel.)

Worse than that - it's an astronomical waste of energy if we could be doing the same thing in less time. I wonder how many hashes humanity runs a day?


I'm curious as to what sort of files you are hashing where ~0.5 gb/s is so slow as to cause a noticeable delay but yet the file is not "large".

For that matter, if your file is on disk, i would expect that sha-256 already works faster than reading from disk, so its not like blake3 is going to improve on that.


I heard the problem is so prevalent that they started making custom ASIC boards for calculating SHA-256 and building big farms full of them. At least we know that each and every one of the hashes calculated were useful and necessary, right?


I don't know why you're being snarky at me?

We need cryptographic hashes for many legitimate applications completely outside crypto-currencies, many of which protect critical things like national security and personal privacy.


Snarky is directed at the people who do proof-of-work cryptocurrency, not snarky directed at you. My bad for not putting a winky face or something.


To be fair, running the AVX-512 implementation of BLAKE3 on all cores is a pretty intense power draw :)


SHA-256 is also normally implemented using AVX-512.

And BLAKE is faster even on a single core, so it's strictly less compute time using the same instruction set.


That's Internet Explorer levels of shortsightedness, my friend.


I found the footnote quite interesting:

> Some software, notably git, is still using SHA-1, and relying on the fact that the best publicly-known method of generating SHA-1 collisions costs 2⁶⁹ computations, which is expensive. I think it is unwise to rely on this for two reasons. One is that there could be more efficient techniques to compute SHA-1 collisions that we don’t know about. Another is that the cost of doing 2⁶⁹ computations is falling rapidly—at the time of this writing (March 22, 2014), the Bitcoin network is performing enough computation to generate SHA-1 collisions every 131 minutes!

By guesstimating from just looking at the graph to the linked site, it seems the Bitcoin network was at about 100 PH/s, with the network at 185 EH/s, which is close to a 2000x in hashrate since this blogpost went live.


Hash function transition document from the Git documentation: https://git-scm.com/docs/hash-function-transition/


This isn’t really true. git is relying on the second-preimage resistance of sha1 rather than the collision resistance. There are few attacks that theoretically are enabled by finding Sha collisions, by they aren’t really practical.


And Linus was told of this and, in typical Linus fashion, he dismissed it.


Is there some way of co-opting the bitcoin network into calculating specific hash collisions?


You would need 6 zettabytes per second to even store all the hashes generated by the BTC network. That’s many times the world’s storage capacity generated every second.


No; it's a different algorithm.


And was he wrong?


Yes. He chose to design his system with no ability to migrate and use a hash algorithm that was already known to be flawed.

https://www.metzdowd.com/pipermail/cryptography/2017-Februar...


Oh, you're not talking about him dismissing news of practical attacks as a direct threat, you're talking about him being dismissive very early on about the idea that he was doing a bad job of picking a hash.

Yeah, okay, he was wrong to be dismissive there.


From the threat it looks like Linus' threat model for git is simply different from the common use today. Of course a judgement under a different threat model would make it seem like a bad tradeoff!


It's more that Linus:

1. Didn't understand the attack

2. Doesn't understand security

3. Doesn't care

He's shown this repeatedly for decades.


Can we have a hash function that is theoretically secure, rather than just "we shuffled a bunch of bits, and nobody yet knows how to unshuffle them, but in 20 years someone might discover how to"?

For example, encrypting the data with a public key where nobody knows the private key ought to do the job, for example, public key=pi. Then use the encrypted data as the 'hash' (or some shortened version of it by discarding bits).

Yes, I know it would be slower, but it might be better to pay the performance cost than to have to move everything to a new algorithm every 20 years.


It would be slower but it would also likely be less secure. Hashes are (roughly) based on a block cipher cryptography, which is arguably a more straightforward research project --- the rate of practical advancements on (say) finite field dlog cryptography is scarier than that of (say) new applications of differential cryptanalysis on block cipher cores.

Hashes have kind of a bad rap because we happened to get a whole string of 1990s-vintage hash functions that survived into the 2010s. AES had mostly just DES-EDE to replace. You can find somewhere a tweet from JP Aumasson, one of the Blake/Blake2/Blake3 authors, suggesting that we might never break SHA-2. SHA1 had a low (again: 1990s) security margin (2^80 collision resistance), among other problems (notably: susceptibility to differential cryptanalysis).

(It's also possible that I've said something dumb enough that one of the other Blake authors will smack me down and improve the thread).

The reasons for looking at SHA3 and Blake2 or Blake3 don't have that much to do with security, as much as they have to do with speed and flexibility.



More directly and earlier, and, amusingly, a reply to tptacek is:

https://twitter.com/veorq/status/834872988445065218

The line of thinking is more fully (if less sha2-specifically) covered in Aumasson's 'Too much crypto' paper, which has also had HN discussions.


If you can compute a hash in polynomial time then you can also verify a potentiel pre-image (i.e. input) in polynomial time, which puts the problem of finding a pre-image in NP. Hence proving pre-image attack resistency would imply P!=NP unless computing the hash itself can't be done in polynomial time.

So for the time being the answer is no, we can't have a hash function that is theoretically secure. Your suggestion just moves the problem onto proving that the public key cryptography is secure, which suffers from the same problem.


Technically the time is constant since some finite set of inputs is enough to generate all hashes. At the very least we'll need to be careful with respect to what the time is 'polynomial'.

For what it's worth I see no reason you couldn't identify some NP-hard problem for which the original input is the solution. This means that reversing the hash directly is out of the question, but this only works if you the way you generate the NP-hard problem is itself difficult (or ideally random) enough that it can't be reverted.

Sounds like a lot of effort, but who knows, it might be worth the effort in some cases.


The difficult part of basing a one-way function around NP-hardness isn't so much the reduction itself as it is the ability to sample from the "hard core" core the distribution.

For example, say you base the hardness of your OWF on the NP-hard problem of finding a Hamiltonian Cycle in an n-node graph. Your reduction could be completely valid and sound, meaning that pre-imaging your function would require solving an NP-hard problem. But, maybe suprisingly, HamCycle is actually extraordinarily easy for an overwhelming majority of n-node graphs. This is the result of a phase transition that happens when the degree of a random graph passes O(log n) or so, where suddenly the probability of such a graph having a HamCycle becomes 1-o(1) > 0.999999 for n sufficiently large, and a simple greedy algorithm can find them (ref. Pósa '76). And most graphs have average degree closer to n/2 than log(n), meaning that finding a HamCycle is trivial.

The crux, finally, is that basing your problem on an overwhelmingly trivial problem, even if technically NP-hard, is still not going to work for security. It will be little consolation that perhaps one customer's account was not broken into when the other hundred million accounts were cracked in milliseconds.

A solution to this problem would require researchers to find an NP-hard problem that is hard on average when sampled from an efficiently-sampleable distribution (plus a couple other techcnical caveats). Whether or not there is a good way to do that, for any NP-hard problem, is still mostly a major open question, and not for lack of trying.


Interesting, I wrongly assumed it would be easy to have an NP-hard problem that's also NP-hard on average. Thanks for the great response.


There's this (called SWIFFT): https://en.wikipedia.org/wiki/SWIFFT#:~:text=In%20cryptograp....

There are also trivial constructions that exploit the difficulty of Discrete Log. But they are slow.

You do have to be careful when using these hash functions, because they satisfy potentially undesirable properties like H(a + b) = H(a) + H(b), which make them unusable as random oracles.

There haven't appeared any cracks in SHA2 for the nearly two decades in which it's existed. So unless it's possible to have a sudden and totally inexplicable break, we'll probably be safe for many decades hence.


> Yes, I know it would be slower, but it might be better to pay the performance cost than to have to move everything to a new algorithm every 20 years.

What if your way reveals to be insecure as well in 20 years? Who knows what will happen in that time-frame, it's a long time in our interconnected world.

What if instead of trying to find something that works forever, you embrace change and make it easy to change the algorithm instead? So when you need to change in the future (not "if"), you can easily just switch a value to change to the new one. Have solid testing created already for. Don't assume invariants that changes between the different hash functions, like length or what bytes there will be in them.


> Can we have a hash function that is theoretically secure, rather than just [...] and nobody yet knows how

> For example, encrypting the data with a public key where nobody knows the private key ought to do the job

Errr... public key cryptography is only "theoretically secure" because we assume "nobody yet knows how..." to solve some hard problems like prime factoring, discrete log, etc.


What's different about encryption algorithms that make them not fundamentally "shuffling a bunch of bits"?

Anyway, BLAKE is based on ChaCha.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: