> Compilers emit lea on x86-64 these days to save ports and you think they’ll us...

saagarjha · on June 19, 2023

But there’s inherently more work. You need to keep track of some extra state and when the overflow actually occurs you need to unwind processor state and deliver the exception. You can make this cheap but it definitely cannot be free. From the words you’re using I feel like you have a model in your head that if you can just encode something into an instruction it’s now fast and that instructions are the way we measure how “fast” something is, but that’s not true. Modern processors can retire multiple additions per cycle. What this will probably look like is both of them are single instructions and one of them has a throughput of 4/cycle and the other one will be 3/cycle and compiler authors will pick the former every time.

codedokode · on June 19, 2023

> Modern processors can retire multiple additions per cycle.

Then add multiple overflow checking units.

> one of them has a throughput of 4/cycle and the other one will be 3/cycle and compiler authors will pick the former every time.

Currently on RISC-V checked addition requires 4 dependent instructions, so its throughput is about 1 addition/cycle.

brucehoult · on June 19, 2023

> Then add multiple overflow checking units.

You can't.

With your favoured ISA style you can't just put 4 or 8 checked overflow add instructions in a row and run them all in parallel because they all write to the same condition code flag. You have to put conditional branches between them.

Or, if you want an overflowing add to trap then you can't do anything critical in the following instructions until you know whether the first one traps or not e.g. if the instructions are like "add r1,(r0)+; add r2,(r0)+; add r3,(r0)+; add r4,(r0)+". In this example you can't write back the updated r0 value until you know whether the instruction traps of not. Even worse if you reverse the operands and have a RMW instruction.

codedokode · on June 19, 2023

> With your favoured ISA style you can't just put 4 or 8 checked overflow add instructions in a row and run them all in parallel because they all write to the same condition code flag.

This can be implemented using traps, without flags. And RISC-V supports delayed exceptions, which makes the implementation easier.

> In this example you can't write back the updated r0 value until you know whether the instruction traps of not.

RISC-V supports delayed exceptions, so you actually can.