many of the decisions made in cryptographic algorithms (ie why this constant? why this hardcoded value?) are made as a trade off for security concerns, but also for memory size, performance during calculations, and feasibility of hardware implementation.
A good cryptographic algorithm is a balanced trade-off between all those factors (and more, for example, resistance to cryptanalysis attack would be something desirable as well)
It's not that useful to ask "why" when it comes to cryptographic primitives since sometimes the answer is "because this constant works well and gives us the performance characteristics we were looking for"
In particular, the constants are very critical to the algorithm's security. If the constants are maliciously chosen, the algorithm could be "backdoored" with an exploitable structure. For example, if you modify just four 32-bit constants in SHA-1, without touching any other operations, it becomes nearly trivial to obtain collisions (current SHA-1 collisions have required enormous amounts of compute to achieve): https://malicioussha1.github.io/
The constants for SHA-256 are chosen to be simple mathematical constants, reducing the number of "degrees of freedom" for an attacker to manipulate the standard. At the same time, the constants need to be "random" enough to avoid producing exploitable structure. It's a careful balancing act! Roots of primes and functions or digits of transcendental numbers (e, pi, etc) make good random-ish numbers which don't provide much room for manipulation.
> Roots of primes and functions or digits of transcendental numbers (e, pi, etc) make good random-ish numbers which don't provide much room for manipulation.
There's a big difference between symmetric and public key cryptography when it comes to this sort of explainability.
Public key cryptography like RSA, Elliptic Curves and Lattices have elegant mathematical theories behind them. Learning those will teach you much about why things are the way they are and how the security boils down to well known hard problems.
Symmetric key cryptography like SHA2, SHA3 and AES don't have nearly as much theory behind them. It's a bit of a dark art where functions are designed to thwart known attacks. There is no foundation like a proof that shows breaking SHA2 is equivalent to prime factorization or anything like that.
This doesn't necessarily have to be the case, there is research in 'provable secure hash functions', but these tend to perform worse, and we value performance a lot in symmetric crypto.
And yet...it's because of that theory that public key crypto is known to be more vulnerable to quantum computing attacks. Quantum computers can't break modern symmetric crypto.
Indeed, understand-ability is a double-edged sword in cryptography. In general it's best to follow Kerckhoffs's principle and make sure you understand why it works, or risk that your adversary learns more than you do.
> Quantum computers can't break modern symmetric crypto.
Much like we don't have proofs that it is classically secure we also don't have proofs it is secure in a quantum setting. But absence of proof is not proof of absence. In fact we know that Grover's algorithm halves the strength of any hash function. In general you should assume that symmetric cryptography is easier to break on a quantum computer than on a classical one because they are strictly more powerful.
In case of RSA and Elliptic Curves, we known there are theoretical quantum algorithms that undermine the security. But for Lattice cryptography there are actually proofs that they remain secure in a quantum setting.
> It's an algorithm, so you're probably looking for a higher level block diagram or something like a pseudocode explainer?
Not so much. What I'd really like is a differential design analysis, i.e. "if it didn't do [this step], then the output distribution would look like [graph] instead of [graph], or it would be vulnerable to [attack]" or "if, instead of doing [step] here, it did [other step], then we would have what is known as the [some other algorithm]."
the replies from nneonneo and remcob shed a bit of light.
magic numbers and magic steps are part of the algorithm. It's not a result of a "planned" proof but more about choosing the right constants and right operations to resist attacks and decasteve added, to be obvious by using numbers that are generally known to be calculable (instead of obscure random numbers, you pick random numbers that anyone knows - like 1st 10 digits of pi, etc)
There are many different hash algorithms where they do indeed alter some of the rotations and the constants. and from time to time they are found to be vulnerable to collisions. For example, SHA1 had weaknesses, and I highly suspect the "apparantly magic" steps for SHA2, which you see here, are to "fix" the weaknesses of SHA1
I think it's useful to investigate how SHA1 works/breaks, and how the steps in SHA2 fixes them - that's how I would start in learning how these steps work. Going straight to SHA2 would be a bit harder because SHA2 specifically has steps to fix SHA1 weaknesses, and if you didn't know what those weaknesses were, it would seem like magic.
It actually isn't important to know much about the internals to grasp this attack, let's see why quickly:
Your cryptographic hash function maintains a bunch of state which is manipulated by shoving data through it. One obvious way to get the output is just serialise all of that state, and this is what the Merkle–Damgård hashes like SHA-256 do.
That's how the Length Extension becomes possible. The details of how that state is created don't matter to the attack, because the hash output itself provides an attacker with the complete internal state and they can extend from there. [There are some small subtleties around padding].
In SHA-3 (and all Keccak variants and similar modern hashes) there is instead a "squeeze" mechanism that takes the internal state of the hash and outputs some bits derived from that state. The state inside SHA-3 is always 1600 bits, that'd be a real pain to write down compared to a 256 bit SHA-256, but the "squeeze" can give you say, 256 bits like you had with SHA-256. Because this isn't the actual internal state of the hash an adversary can't do anything useful with it, such as the Length Extension Attack.
Since the input fits in one chunk it’s hard to follow how it would work for larger inputs.
The “chunk loop” is only executed for one iteration so it requires careful reading to see how the state gets form one iteration to the next. (Hint: it’s the h0-h7 hash values).
Perhaps it could be stated that h0-h7 is the state that is carried over between iterations?