The real important bits: "Front-end CPU stalls - The workload wastes a lot of time in CPU instruction cache misses, instruction TLB misses, and to a lesser extent branch mispredictions".
My educated guess is that it relocates the hot path of the text segment to better pack into the instruction cache. Cool.
My educated guess is that it relocates the hot path of the text segment to better pack into the instruction cache. Cool.