Speculative execution in computer architecture terminology is reserved for specu...

Symmetry · on Jan 4, 2018

Yes, and the A53 speculatively executes past branches, otherwise there's little point in having a branch predictor. The difference between it and an out of order processor is that the depth of speculation is limited by the pipeline length rather than the reorder buffer.

EDIT: The pipeline layout severely constrains what sort of speculation can happen and, thinking about it more, I'm really surprised that the A8 can finish load and issue a second one before the mis-speculated branch is resolved and the loads are quashed.

EDIT2: Well, that's not the only difference obviously. The big one that makes a processor in order versus out of order is whether a stalled instruction causes all subsequent instructions in the stream to stall or just ones with data dependencies on it. But the problem of quashing bad instructions is shared by all processors that are pipelined and which can throw exceptions, not just those that continue speculatively issuing past predicted branches.

fulafel · on Jan 4, 2018

I'm not familiar with the A53, maybe it is some kind of hybrid, but certainly there are branch predictors in pipelined in-order superscalar processors that don't speculate past branches, like Pentium or 21064. The advantage is that a correctly predicted branch can be executed quickly and the following instruction can be fetched & decoded. This doesn't yet require register renaming, reservation stations etc.

Symmetry · on Jan 4, 2018

I think we have a conflict of definition? I would call instruction fetch and decode behind an unresolved branch speculation. If you have a machine with a single execution stage which can't issue instructions in parallel with the branch then of course you can't have these speculated instructions execute because whether they are valid will always be resolved before they get to the execution or writeback stages. But the mechanics of putting instructions into the pipeline on the basis of speculation and quashing them if the branch resolves differently than predicted is the same and re-use the same hardware mechanisms that handle exceptions, making the speculative issuing of instructions a clear win from a design perspective. It's the details of the pipeline that determine whether it's possible to trigger the Spectre vulnerability before the branch resolves.

Register renaming isn't required for speculation and you can even design out of order micro architectures without it. On an in-order processor you just have to make sure that writeback occurs after the branch is resolved and just flush the pipeline above the branch if the branch resolves incorrectly. No need to mess about with checkpoints because the register state is always valid. Well, there are some considerations with store commands too if you're using superscalar execution but it's still pretty easy. But again, these are all mechanisms you need in any event because your processor has to deal with interrupts that might be thrown at any moment by external triggers or memory faults.

EDIT: I suppose it's possible in theory that you could look at an upcoming branch, predict how it will resolve, and then fetch the relevant instruction data into your instruction cache. But I'd tend to call that an instruction pre-fetcher by analogy to the data pre-fetcher eveyone uses rather than a branch predictor. And I'm not aware of anyone every having done that.

fulafel · on Jan 4, 2018

Here's the 1997 Intel optimization manual: https://www.ece.cmu.edu/~ece548/localcpy/24281603.pdf

In 2.1.3, that pertains to the original Pentium processor, it describes how the predicted branch target instruction is fetched.

This is not commonly considered speculative execution in computer architecture terminology.

In 3.2.1 it describes how on later OoO processors the fetched instructions are speculatively executed.

Here in the description of the 21064, another in-order processor, the pipeline is described (under "Conditional-Branch Pipeline Flow") as potentially taking zero cycles for correctly prediced branches, meaning that the instruction decode is also pipelined: http://collaboration.cmc.ec.gc.ca/science/rpn/biblio/ddj/Web...

Symmetry · on Jan 4, 2018

You're right, that's speculative issue rather than speculative execution. I was being sloppy in my terminology when I called it that. It is speculation, however.