Can anyone clarify his claim on MIPS data hazards? I didn't follow that one. To ...

walter_artica · on May 1, 2016

AFAIK, since 1991 (with the MIPS R4000) they have included the necessary interlocks in the integer pipeline. Just check the reference manuals from that age (e.g. compare the R3000 and R4000 manual section on the pipeline).

Or else use a search engine to find sources like this: https://books.google.com.pe/books?id=LL52JBPU4CwC&pg=PA52

The article's author should review recent documentation on the MIPS architecture. I think that the paragraph that starts with the phrase "MIPS is the worst offender" seems ludicrous for people working with modern MIPS implementations (let's say post-1992!).

yuubi · on May 1, 2016

MIPS, the Microprocessor without Interlocked Pipeline Stages, doesn't make sure the result of one instruction is available before executing an instruction "later" in the stream that refers to that result. Something like (pseudo-asm with C-notation comments)

     ld r1, @r2   ; r1 = *r2
     add r3,r1,r4 ; r3 = r1+r4

wouldn't set r3=*r2+r4 because the memory access hasn't finished by the time the add runs.

pm215 · on May 1, 2016

This was only in the early versions of the MIPS architecture though (MIPS I, I think?). Later versions required the interlocks, so the add would stall rather than misbehaving, and you didn't need to actually put a nop in the load delay slot (though being able to schedule some useful insn into it was still performance-wise worthwhile). Since MIPS I implementations are a distant memory, in practice this in-retrospect misfeature is now ignorable these days. (In contrast, branch delay slots cannot be forgotten about because you can't backwards-compatibly change the branch insn behaviour; the best you can do is add new branch instructions which don't execute the delay slot insn, which MIPS has also done to some extent.)

Both load delay slots and branch delay slots are allowing the microarchitecture (a simple 3-stage pipeline) to dictate architecture, which is a classic way to store up pain for the future.

russell · on May 1, 2016

The lack of interlocks really surprised me, although the name said so. The CDC 6600 had them two decades earlier. We always carefully scheduled our instructions flows, but it was nice to know that the hardware would catch our goofs.

aidenn0 · on May 1, 2016

And this got really fun when superscalar MIPS processors with instruction prefetching came out. They had to introduce a different NOP called SSNOP that stalled all ALUs. Obviously they couldn't just declare "NOP stalls all ALUs" as that would have serious performance affects for places where NOPs are necessary (e.g. branch delay slots).

protomyth · on May 1, 2016

I'm curious why they still had NOP as part of the name since it actually did something.

aidenn0 · on May 2, 2016

Well NOP does nothing on one ALU, SSNOP does nothing on all ALUs.

And to give you an example of how you had to calculate things:

There were, if my memory serves me correctly 6 pipeline stages on the 5k numbered 0-5, plus instruction prefetch which was numbered -1. You subtracted the stage in which the instruction took effect from the stage in which a subsequent instruction needed to see that effect, and the result was the number of intermediate stages that all ALUs would need to go through.

Worst case scenario would be if you were modifying RAM that would be read as an instruction; it wouldn't take effect until stage 5, and instruction prefetch was stage -1 so you needed to make sure all ALUs were busy for 6 clock cycles. In theory you could do the math to figure out the scheduling for each ALU, but I just dropped 6 SSNOPs in there, since it was a code path that was only hit during loading of a new process, 6 wasted clock cycles was not a concern.

Note that this is unrelated to interlocks, as any Modified Harvard Architecture will require some sort off synchronization when changing the instruction stream. However, most ISAs have a single instruction that stalls the pipeline and discards any prefetched instructions (e.g. isync on Power). They added one in later revisions of the MIPS ISA as well.

Another fun thing was that there was no interrupt-safe way to disable interrupts, as the interrupt-enabled bit was in a word-sized register along with other values that could legitimately be changed by an ISR. This was also fixed by later revisions of the ISA.