Hacker News new | past | comments | ask | show | jobs | submit login

> As you kinda said, almost everything about the 1980s definition of RISC has ceased to be true.

A central difference that still exists that RISC processors are typically load/store architectures. That means that before an operand that exists in memory can be used, it has to be transfered to a register.

This means that an instruction like

add eax, [ecx]

does not work, say, under ARM. Under ARM, you have to use

  ldr r1, [r1]
  add r0, r0, r1
Intel found out that using memory addresses both as source and target turned out to be a bad idea

  add [ecx], eax
(since it needs 3 phases: load value from memory, do instruction, store back). No such instructions thus exist in MMX, SSE..., AVX..., ... On the other hand, Intel still believes that using a memory operand as source only is quite a good idea on x86 (look at the encoding of SSE..., AVX..., AVX-512). Nevertheless: having the capability to do such a complicated instruction atomically is very useful for multithreading; consider for example

  lock add [ecx], eax
which adds eax to the memory address in ecx atomically.

Also, a very typical distinction (that Intel only dropped with AVX on) is that CISC CPUs typically use 2 operands per instruction (of which one may be memory) and RISC CPUs have 3-operand instructions. So

  add r0, r1, r2
works on ARM, but under x86, only instructions that were introduced from AVX on (i.e. use a VEX (VEX2 or VEX3) or EVEX prefix (AVX-512); I have to look up whether something like that is also possible with a XOP prefix) have this capability.

Also very often, CISC instruction sets offer complicated addressing modes, such as in x86

mov edx, [ecx+4*eax]

It is not completely clear whether this is worth the complexity or not. On one hand, such instructions are hard to use for a compiler (which is the central reason why they were abolished in RISC architectures). On the other hand, skilled programmers can use them to write quite elegant and fast code.

TLDR: A central difference that still exists is that

- RISC architectures are load-store architectures

- on CISC architectures 2 operands (1 can be memory address) are typically used and "feel more natural"

- on RISC architectures, instructions typically have 3 operands.

- CISC architectures often support much more different and complicated addressing modes than CISC.




The main point of RISC architectures is that they are trivially pipelineable to the extent that making non-pipelined implementation does not make much sense. All the architecture visible differences from CISC are motivated by that. Load-store gets you well defined subset of instructions that access memory and have to be handled specially, 3-operand arithmetics and zero register simplifies hazard detection and result forwarding logic and so on.


> The main point of RISC architectures is that they are trivially pipelineable

This was the idea behind the original MIPS (the textbook example of a RISC processor - both literally and metaphorically). Unluckily this lead to the problem that implementation details of the internal implementation leaked into the instruction set. Just google for 'MIPS "delay slot"'. When in later implementations of MIPS, this delay slot was not necessary anymore, you still had to pay attention to this obsolete detail when writing assembly code.

The lesson that was learned is that implementation details should not leak into the instruction set.

Next: About what kind of pipeline are we even talking about? It is often very convenient to offer multiple kinds of pipelines dependent on the intended usage of the processor. For example for low-power or realtime applications, an in-order pipeline is better suited. On the other hand, for high-performance applications, an out-of-order pipeline is better suited. For example ARM offers multiple different IP cores for the same instruction set with different pipelines.

Finally, pay attention to the fact that more regular and more easy to decode instruction set of typical RISC CPUs (ARM is explicitly not a typical one in this sense, in particular considering T32) often leads to bigger code than, say, x86. This turned into a problem when CPUs became much faster than the memory (indeed some people say, this was an important reason why people today think much more critical about RISC). This is also the reason why RISC-V additionally provides the optional "“C” Standard Extension for Compressed Instructions" (RVC). Take a look at

> https://riscv.org/specifications/

The authors claim in the beginning of chapter 12 of "User-Level ISA Specification": "Typically, 50%–60% of the RISC-V instructions in a program can be replaced with RVC instructions, resulting in a 25%–30% code-size reduction.".

> 3-operand arithmetics and zero register simplifies hazard detection

Despite the 3-operand format of ARM, at least the A32 and T32 instruction sets offer 2 additional parts for many instructions:

1. conditional execution: for example ADDNE is only executed when the Z(ero) flag is not set. There are 15 variants for conditional execution, including "always").

2. "S" suffix for many instruction: causes the instruction to update the flags. For example SUBS causes the processor to update the flags while SUB does not.

The conditional execution was to my knowledge dropped in ARM64 because branch predictors got good enough.

So: ARM has other things in the instruction set to avoid pipeline stalling. 3-operand instructions are not among of them. The reason for 3-operand instructions rather is that this instruction format allows the compiler to generate efficient code much more easily.


The stall detection logic remark was meant in the context of traditional MIPS-style in-order single-issue pipeline executing regularly encoded instruction set where the mentioned features lead to both smaller implementation of the detection logic itself (which for the traditional MIPS is the bulk of the control logic) and simpler routing of the signals involved.

On the other hand I completely agree that MIPS-style delay slots are simply bad idea. But for me ARM's conditional execution and singular flags register is similarly bad idea that stems from essentially same underlying thought.


Damn, you are right. I thought the load-store architecture was no more in ARM thumb 2. I was wrong. Thanks for the info.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: