I never actually wrote programs with risc-v vectors instructions, only for other...

brucehoult · on May 7, 2021

Actually I can. I can tell you that on the chip I'm currently using "vadd.vv v3, v1, v2" takes 3 clock cycles for LMUL=1, 6 clock cycles for LMUL=2, 12 clock cycles for LMUL=4, and 24 clock cycles for LMUL=8.

You always have to know what current LMUL you have, otherwise you can try to use illegal register numbers. For example in LMUL=8 you can only use v0, v8, v16, and v24. Anything else causes an exception.

"From what I understand, you can't tell how long each vlb.v or vsb.v will take on a given processor, as it depend on the number of elements that will be returned in a4. The last iteration of the loop for example will probably take less time than the others, because the instructions will only have to load and store the trailing part of the data."

If you look at the link I provided, you will see that's not the case. The memcpy() function takes absolutely identical time for any memcpy() length from 0 bytes to 64 bytes. It does not depend on the number of active elements. At least on this chip, which is currently the only chip in the world you can get your hands on with a RISC-V Vector unit.

On some other chip it might take a different amount of time. But on this one, and probably many others, it takes the same amount of time.

"I think I would prefer if an open CISC ISA [...] with smaller code size and memory footprint"

CISC ISAs do not have smaller code size.

The two most compact code modern full-featured ISAs are ARMv7 and RISC-V. They are smaller than i686 by quite a margin.

In 64 bit there is absolutely no competition. RISC-V is the smallest code, with ARMv8 and AMD64 quite similar to each other but significantly bigger.

Just look at the same programs compiled for each one and you'll see. I suggest something like Ubuntu 21.04, which is available for all three. Take a look in /bin and /usr/bin and run "size" on the binaries. It's indisputable. RISC-V is the clear winner in 64 bit ISAs. ARMHF is similar or a little bit more compact in 32 bit. CISC x86 isn't close in either case.