everyone should write at least one whole application in assembly just to appreciate how the hardware really works
Unfortunately, with out-of-order execution and instruction-level parallelism, I doubt learning assembly teaches you much about how the hardware really works.
Microarchitecture doesn't change the fact that the instructions in your program - the ones that you can work with - still have the same programmer-visible behaviour (except perhaps being a little faster.)
I don't dispute that. I'm saying the model that you learn from learning assembly is very different to what the hardware is doing.
Concretely, learning assembly, you might assume each core has a set of physical registers that correspond to the registers you see and that isn't the case.
Unfortunately, with out-of-order execution and instruction-level parallelism, I doubt learning assembly teaches you much about how the hardware really works.
Edit: To the downvoter, care to comment?