Usually simpler code is better. I used to see a lot of mistaken "high performance" code where someone has unrolled all of the loops because they think unreadable code goes faster, though maybe people have gotten better about this? (OpenSSL is an example here.)
Unfortunately not all kinds of slowness only happen in hotspots. This is true for CPU cycles, but if an occasional task uses all memory, it's going to mess up everything downstream as well.
I wonder where the trade-off is for loop unrolling.
Like, unrolling a `for` loop that only has 5 iterations makes sense. But if you have 100 iterations, then the larger memory footprint of all the code might actually make it slower than just keeping the `for` loop.
Unfortunately not all kinds of slowness only happen in hotspots. This is true for CPU cycles, but if an occasional task uses all memory, it's going to mess up everything downstream as well.