Hacker News new | past | comments | ask | show | jobs | submit login

x86 does this already since the P6, in fact the performance difference would be much worse if it didn't. The reason why this particular case has that behaviour is because there are actually two conditional branches - one for the loop itself, and the other for the check inside the loop. The CPU is able to execute both paths of one branch in parallel, but not the 4 possible paths of two branches in a row.

(I'm quite tempted to get out one of my P55Cs to try it out...)




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: