Looks cool. But why asm? Generating C or C++ code and letting the GCC optimizer do the heavy lifting for you could probably get you a massive perf boost.
Absolutely. The amount of tricks used to achieve that is amazing, including things register renaming, branch prediction, virtual registers, instruction reordering etc.
That is not really what demonstrated here though, as gcc was run with optimizations turned off, since otherwise the program would have been optimized to printing a constant.
[1] ecee.colorado.edu/ecen4553/fall09/notes.pdf