And yet - line 34 of the spreadsheet shows "xor r8d,r8d" as being quite expensive. My theory is that the branch which targets it must be mispredicted and it gets blamed.
Similarly, line 24 of the spreadsheet shows "cmp ebx,eax" as being quite expensive. The only explanation I can come up with is that the previous instruction was blocked by a cache miss and then the "cmp" must have been folded in with the add so that mumble mumble nope I don't have a good mental model.
So, there are quite a few examples of instructions which by themselves are obviously always fast being hit by a huge portion of the CPU samples.
And yet - line 34 of the spreadsheet shows "xor r8d,r8d" as being quite expensive. My theory is that the branch which targets it must be mispredicted and it gets blamed.
Similarly, line 24 of the spreadsheet shows "cmp ebx,eax" as being quite expensive. The only explanation I can come up with is that the previous instruction was blocked by a cache miss and then the "cmp" must have been folded in with the add so that mumble mumble nope I don't have a good mental model.
So, there are quite a few examples of instructions which by themselves are obviously always fast being hit by a huge portion of the CPU samples.