The main point of RISC architectures is that they are trivially pipelineable to ...

wolfgke · on Oct 18, 2018

> The main point of RISC architectures is that they are trivially pipelineable

This was the idea behind the original MIPS (the textbook example of a RISC processor - both literally and metaphorically). Unluckily this lead to the problem that implementation details of the internal implementation leaked into the instruction set. Just google for 'MIPS "delay slot"'. When in later implementations of MIPS, this delay slot was not necessary anymore, you still had to pay attention to this obsolete detail when writing assembly code.

The lesson that was learned is that implementation details should not leak into the instruction set.

Next: About what kind of pipeline are we even talking about? It is often very convenient to offer multiple kinds of pipelines dependent on the intended usage of the processor. For example for low-power or realtime applications, an in-order pipeline is better suited. On the other hand, for high-performance applications, an out-of-order pipeline is better suited. For example ARM offers multiple different IP cores for the same instruction set with different pipelines.

Finally, pay attention to the fact that more regular and more easy to decode instruction set of typical RISC CPUs (ARM is explicitly not a typical one in this sense, in particular considering T32) often leads to bigger code than, say, x86. This turned into a problem when CPUs became much faster than the memory (indeed some people say, this was an important reason why people today think much more critical about RISC). This is also the reason why RISC-V additionally provides the optional "“C” Standard Extension for Compressed Instructions" (RVC). Take a look at

> https://riscv.org/specifications/

The authors claim in the beginning of chapter 12 of "User-Level ISA Specification": "Typically, 50%–60% of the RISC-V instructions in a program can be replaced with RVC instructions, resulting in a 25%–30% code-size reduction.".

> 3-operand arithmetics and zero register simplifies hazard detection

Despite the 3-operand format of ARM, at least the A32 and T32 instruction sets offer 2 additional parts for many instructions:

1. conditional execution: for example ADDNE is only executed when the Z(ero) flag is not set. There are 15 variants for conditional execution, including "always").

2. "S" suffix for many instruction: causes the instruction to update the flags. For example SUBS causes the processor to update the flags while SUB does not.

The conditional execution was to my knowledge dropped in ARM64 because branch predictors got good enough.

So: ARM has other things in the instruction set to avoid pipeline stalling. 3-operand instructions are not among of them. The reason for 3-operand instructions rather is that this instruction format allows the compiler to generate efficient code much more easily.

dfox · on Oct 18, 2018

The stall detection logic remark was meant in the context of traditional MIPS-style in-order single-issue pipeline executing regularly encoded instruction set where the mentioned features lead to both smaller implementation of the detection logic itself (which for the traditional MIPS is the bulk of the control logic) and simpler routing of the signals involved.

On the other hand I completely agree that MIPS-style delay slots are simply bad idea. But for me ARM's conditional execution and singular flags register is similarly bad idea that stems from essentially same underlying thought.