I've read about Xeon Phi a few months ago and I really want to get my hands on one. My problems are in the embarrassingly parallelizable class (or almost). Having said that, does anybody know how each Xeon Phi core performs with respect to a modern Intel processor (i7 or Xeon) for standard numerical code (Linpack etc.)?
They're Pentium-class x86 cores and barely any more than front end control processors for the vector hardware. The fact it's x86 is almost incidental, IMHO; the vector ISA is all programmers should really care about on the Phi.
Maybe I'm missing something, but do in-order architectures even have much use for branch prediction? They can't speculatively execute based on the outcome of a conditional, right?
Sure they can. Branch prediction allows you to move an instruction along the pipeline before the instruction determining its outcome has been retired. Without branch prediction, every conditional jump will potentially stall the pipeline. With branch prediction, a correctly predicted branch executes quickly, and a mis-predicted branch results in a pipeline flush.
Instruction re-ordering is more about taking full advantage of multiple execution units (ALUs, etc.), or not completely stalling the pipeline to wait on a memory fetch.
I was curious about this, since the point is that you can run "Xeon" code on the Xeon Phi, but the Phi doesn't support SSE, MMX, or AVX so wouldn't you need to recompile to take advantage of the vector hardware?