Has anyone made a processor that speculatively executes both branch options? I guess good simulations would show weather that or branch prediction is better.
I think this is called dual or multi-path execution if you want to Google for it.
http://people.cs.clemson.edu/~mark/eager.html says ...I've seen it alleged that some mainframe processors in the 1960s and 1970s executed down both paths beyond a branch; but, as far as I'm aware, multiple-path execution has never been done by a commercial processor.
From what I recall reading about this in the academic literature, the primary concern here is that speculative operations on the alternate paths consume power just like any other operation.
I think there were some older mainframes (CDC? System 360?) that may have done this to some extent -- it'd only really be feasible up to a certain (very low) point though, since you have the potential for an exponential explosion in the number of possible execution paths if more branches come up before you've resolved the outcome of one you're speculating on (especially in the deeper pipelines of modern machines, with perhaps 3- or 4-wide issue and a dozen or so cycles before a branch gets resolved).
x86 does this already since the P6, in fact the performance difference would be much worse if it didn't. The reason why this particular case has that behaviour is because there are actually two conditional branches - one for the loop itself, and the other for the check inside the loop. The CPU is able to execute both paths of one branch in parallel, but not the 4 possible paths of two branches in a row.
(I'm quite tempted to get out one of my P55Cs to try it out...)
Isn't that how Itanium is supposed to work? I mean, I think specific instructions would have to be sent in order to force it to happen. But, I seem to recall the line "predication, not prediction" as one of the features, if you want to call it that.