By far the most interesting part of this post is the update with newer compilers...

winstonewert · on Sept 15, 2017

The update with the newer compilers was done back in 2012. There have only been minor changes since then. (Not to say that it isn't interesting, but so that people don't go to the post expecting new content and get disappointed)

taeric · on Sept 15, 2017

I updated to make this clearer at the top level. Thanks!

mikeash · on Sept 15, 2017

Thanks for pointing that out. I'd seen this before so I wasn't going to bother clicking the link, not knowing there was a cool new thing there.

taeric · on Sept 15, 2017

Glad it was worth clicking. I see I accidentally misled some folks that it was recently updated.

prirun · on Sept 16, 2017

Yes, but checkout the Wikipedia page for Intel's C compiler. It points out that Intel's compiler selects optimum code if the CPU reports "GenuineIntel", but selects the least optimum code if the CPU is non-Intel. AMD sued Intel for this, and now Intel has a mumbo-jumbo disclaimer that it may not generate optimum code on a non-Intel CPU (without disclosing that it generates the least optimal code).

augusto2112 · on Sept 15, 2017

Can someone explain how interchanging the loops make it immune to mispredictions please? I don't get it

taeric · on Sept 16, 2017

If I'm reading it right, the Intel compiler just took advantage of the outer test loop. It was supposed to loop over the data many times. Instead looped over each item of the data many times. Got the same answer, but the branch predictor had a much easier job.

jnordwick · on Sept 15, 2017

Can you point those out please? I dont see any substantive edits in the last few years. Mostly just changing the notice, bounty award, and a roll back this year.

taeric · on Sept 15, 2017

Updated my post. I merely meant the section. It was still old. :)