New SiFive RISC-V core P650 with 40% IPC increase

snvzz · on Dec 2, 2021

Some context: RISC-V Summit is next week, and RISC-V international has just approved a batch of important extensions[0]. With these extensions, RISC-V is not missing anything relative to ARM and x86 ISAs in terms of functionality.

I expect a lot of tape-outs to happen this month, as core vendors were probably holding for the announced ratifications, in fear of last minute changes. Next year is going to be exciting.

[0]: https://riscv.org/announcements/2021/12/riscv-ratifies-15-ne...

monocasa · on Dec 2, 2021

I wouldn't say RISC-V isn't missing anything. The lack of add/subtract with carry is an issue for efficient runtime of many JITed languages like JavaScript.

That being said, I don't think it's the worse thing in the world like some do. The focus now should be on compiled code since JITs by definition can make runtime descions on if some future extension that fixes this deficiency exists or not. The J extension has stalled for the moment, but with these other extensions ratified there should be more bandwidth available hopefully.

teruakohatu · on Dec 2, 2021

Can't vendor's making desktop/mobile class CPUs detect the equivalent pattern and optimize it in microcode or silicon?

Or is that what we are trying to get away from?

monocasa · on Dec 2, 2021

Maybe, but it's a leap, IMO. The equivalent patterns are 3x as long, and modify tons of arch visible state for their intermediate results which leaves more work for those combined instructions to do.

The complaint is valid, IMO, and would show up on the filtration test they used to come up with ops if they were working with JITs too rather than just what's in AOT code.

userbinator · on Dec 3, 2021

It can try... but you're basically trying to "decompile" or "compress" code to a higher level, and that's not easy nor efficient. If something relatively simple like ADC is difficult, think of something like an entire encryption/hash round, which competing CISC processors already have dedicated instructions for. In the case that you do manage to make that work, there's still the matter of those extra instructions taking up valuable space in caches and memory bandwidth.

Hence why I don't think "RISC is the future" unlike a lot of other proponents; I think a CISC with uop-based decoding will be more scalable and performant. Even ARMs have moved a little in that direction.

adgjlsfhk1 · on Dec 3, 2021

Note that RiscV has just gotten the cryptography extension finalized.

wbl · on Dec 3, 2021

Classic CISC processors like the VAX had lots of memory to memory instructions complex looping constructs etc. Special ops that are register to register aren't anti-RISC.

throwaway81523 · on Dec 3, 2021

> Can't vendor's making desktop/mobile class CPUs detect the equivalent pattern and optimize it in microcode or silicon?

The riscv stans keep saying that, but nobody has given a demo or shown benchmarks afaik, even under simulation. So it's just handwaving.

It's not only javascript, of course. int overflow in C is an error condition (undefined behaviour) that compilers usually don't try to trap (the -trapv option in gcc and clang enables trapping at some performance cost, so it's rarely used and we get continuing bugs and vulnerabilities as a result. Ada mandates trapping unless you enable an unsafe optimization which is, um, enabled by default in GNAT). Riscv increases that performance cost considerably from what I can tell. That's the opposite of what we needed.

I'm no CPU architect but I know they are able to signal overflow in floating point arithmetic, since IEEE 754 requires that. So I don't understand why they can't do it for integers.

jhgb · on Dec 3, 2021

Isn't the obvious solution to the problem of overflows to define the behavior like pretty much all newer languages did it (presumably because they learned from the errors committed by C)?

znwu · on Dec 4, 2021

> we get continuing bugs and vulnerabilities as a result

Background reading: https://huonw.github.io/blog/2016/04/myths-and-legends-about...

If you only want safety, then trapping or not, signalling or not does not matter at all. It is UB that causes safety problems, not the overflow itself. And RISC-V mandates the overflow handling manner. No UBs.

Throw on arithmetic overflow is a language choice. And at least Rust thinks that arithmetic exception everywhere is not necessary for security.

The only related problem with no overflow trapping is that dynamically typed languages needs numerical type conversion on overflow. But TBH, if a numerical javascript program often generates 1.7E308, then it's a terrible program that no one should care.

feffe · on Dec 3, 2021

Interestingly the MIPS CPU traps on overflow for the add and sub instructions. You have to use the addu or subu instructions to get the usual behavior of overflow.

MIPS is kind of the spiritual ancestor to RISC-V

rwallace · on Dec 4, 2021

I'm curious, why do you think JavaScript wants add with carry?

socialdemocrat · on Dec 2, 2021

That is great news! Is there any friendly intro/coverage anywhere of the new vector extension?

I am curious about the final design. Would be interesting to hear how people think it compares with ARMs scalable vector extensions.

snvzz · on Dec 2, 2021

There's been a few talks on the topic. They're archived in e.g. youtube.

I like it. It's fairly simple and clean, yet powerful.

There was also some discussion here in HN months ago, about an article comparing RISC-V V extension and ARM SVE.

The article itself got several things wrong about V, but the discussion[0] was interesting.

[0] https://news.ycombinator.com/item?id=27063748

sdbbp · on Dec 3, 2021

This in-depth presentation is good: https://www.youtube.com/watch?v=oTaOd8qr53U

marcodiego · on Dec 2, 2021

Faster than ARM A-77: https://www.phoronix.net/image.php?id=2021&image=sifive_p650... . Performance comparable to Apple Icestorm architecture, the 'efficiency' cores in M1. Considering A-710 is the fastest ARM core currently available and its successor will only be available next year, SiFive is just a few years before real competition starts in an arena currently dominated by ARM.

This will be beautiful to watch.

zozbot234 · on Dec 2, 2021

It will be interesting to see a comparison on power-efficiency as well as performance. RISC-V implementations have shown a pretty sizeable advantage wrt. power use in the past, and we don't quite know how this advantage compares in these larger, performance-focused designs.

DeathArrow · on Dec 3, 2021

Power efficiency and performance depends very much on process node.

If Apple bought all future 3nm capacity from TSMC, good luck trying to compete.

dmitrygr · on Dec 2, 2021

> just a few years before real competition starts

Are you assuming the competition will just sit and do nothing?

GhettoComputers · on Dec 2, 2021

Good enough” matters more than benchmarks. They can make supercomputers but it doesn’t matter to someone who wants a $100 computer.

DeathArrow · on Dec 3, 2021

Raspberry Pi is $50.

StreamBright · on Dec 3, 2021

Sure, have you actually used one? There some challenges with the software support of RPIs especially model 4 and GPU drivers. I would like to see a platform (potentially RISCV) that has great software support and I could finally use one of these devices as a replacement for TV set top boxes running Android.

klelatti · on Dec 3, 2021

If the issue is with GPU drivers then why would a RISC-V cpu make any difference?

There is a lot of wishful thinking that using RISC-V magically makes all of the SOC more open. It doesn’t!

StreamBright · on Dec 3, 2021

Because a CPU architecture does not exist in a vacuum and RISCV’s marketing is about being open, so I expect they are going to have the rest of the SOC as open as well. I guess it will be easy enough to have all the drivers part of Linux.

I agree, this is wishful thinking.

dmitrygr · on Dec 2, 2021

All riscv thingies i see today are decidedly not $100. I do see plenty of arm designs running linux under $10 though

LeifCarrotson · on Dec 2, 2021

There are several cheap ones listed here:

https://riscv.org/exchange/

such as the $30 Sparkfun Red or the $20 Lofive boards. Those are for running an RTOS, not Linux, but they compete with Arduino, mbed, teensy, and other ARM Cortex M series microcontrollers.

A price target of $10 is something you'll only hit with massive scale-up.

dmitrygr · on Dec 3, 2021

I was responding to "someone who wants a $100 computer"

Those are not computers

bruce343434 · on Dec 2, 2021

> With a projected score of 11+ SPECInt2006/GHz

That seems to imply a certain integer arithmetic performance, but I wonder what the floating point performance is. They could have just said "X flops".

Comparing to other benchmarks at [1], I have no idea, because they all have denormalized results, so totals, rather than per GHz per core. Nice reporting.

How fast is this thing? Pentium? first gen i3? current gent ryzen 5? The fact that they are being so obtuse about it leads me to believe performance isn't great.

[1] https://www.spec.org/cgi-bin/osgresults?conf=cint2006;op=dum...

wmf · on Dec 2, 2021

I'd compare it to an Atom "efficiency" core.

DeathArrow · on Dec 3, 2021

A little lower than Pentium 4 640.

danielEM · on Dec 2, 2021

Once it gets to the shelfes at reasonable price will be happy to work with/on it.

Curious how IP pricing compares to ARM in this case and how much would I need to put on top of it to tape out own batch of processors

snvzz · on Dec 2, 2021

The license to the ISA itself is free.

There's several vendors besides RISC-V offering cores for licensing. There's even some OSHW cores that can be freely used.

Even if we choose to ignore the technical prowess of being a true 5th generation RISC ISA built with hindsight no other ISA has, what's IMHO a big deal in RISC-V is the mere availability of this market of cores.

It poses a threat to ARM's business model, where ARM licenses cores and ISA, but nobody else than ARM can license cores to others.

dmitrygr · on Dec 2, 2021

> built with hindsight no other ISA has

Why do all the riscv fans Conveniently ignore aarch64 when they make statements like this? It was in fact a completely clean new design, based on hindsight, by people who know what they are doing, and with no legacy Cruft.

brucehoult · on Dec 2, 2021

Aarch64 obviously isn't a completely clean sheet design. It was constrained by having to execute on the same CPU pipelines as 32 bit code, at least for the first decade or so. And the 32 bit mode has to perform well. There are tens of millions of Raspberry Pi 3s and 4s (and later model Pi 2s) which have 64 bit CPUs but have never seen a 64 bit instruction in their lives. Android phones have been supporting both 32 and 64 bit apps for a long time.

The "by people who know what they are doing" thing is just pure FUD. Sure, ARM employs some competent people, but no more so than IBM, Intel, AMD or the various members of RISC-V International.

FullyFunctional · on Dec 2, 2021

I'm a fan of RISC-V but the freedom is a large part of it. Aarch64 is a very well designed ISA and clearly has a lot of benefit of hindsight. The load pair/store pair instructions, the addressing modes, fixed 32-bit instruction size, etc. It all really helps. I suspect that Apple was actively part of designing it.

I think however that RISC-V isn't that much worse and because of the freedom we will almost certainly see more implementation of RISC-V. I'd be watching Tenstorrent, SiFive, Rivos, Esperanto, and maybe Alibaba/T-Head.

snvzz · on Dec 2, 2021

>Why do all the riscv fans Conveniently ignore aarch64 when they make statements like this? It was in fact a completely clean new design, based on hindsight, by people who know what they are doing, and with no legacy Cruft.

aarch64 seems poorly designed to me.

ARMv7 had thumb, but for some reason ARMv8 did not incorporate any lessons from that. As a result, code density is bad; ARMv8 binaries are huge.

ARMv9, to be available in chips next year, is just a higher profile of required extensions, and does nothing to fix that.

Ever wonder why M1 needs such huge L1 cache? Well, now you know.

Considering ARMv9 will be competing against RVA22, I don't have much hope for ARM.

adrian_b · on Dec 2, 2021

ARMv8 code density is quite good for a fixed-length ISA and is of course much better than that of RISC-V.

RISC-V has only one good feature for code density, the combined compare-and-branch instructions, but even this feature was designed poorly, because it does not have all the kinds of compare-and-branch that are needed, e.g. if you want safe code that checks for overflows, the number of required instructions and the code size explode. Only unsafe code, without run-time checks, can have an acceptable size in RISC-V.

ARMv8 has an adequate unused space in the branch opcode map, where combined compare-and-branch instructions could be added, and with a larger branch offset range than in RISC-V, in which case the code size advantage of ARMv8 vs. RISC-V would increase significantly.

While the combined compare-and-branch of RISC-V are good for code density, because branches are very frequent, the rest of the ISA is bad and the worst is the lack of indexed addressing, which frequently requires 2 RISC-V instructions instead of 1 ARM instruction.

snvzz · on Dec 2, 2021

>in which case the code size advantage of ARMv8 vs. RISC-V would increase significantly.

Many things could be said about ARMv8, but that it has good code size is not one of it. It does, in fact, have abysmal code density. Both RISC-V and x86-64 produce significantly smaller binaries. For RISC-V, we're talking about a 20% reduction of size.

There's a wealth of papers on this, but you can verify this trivially yourself, by either compiling binaries for different architectures from the same sources, or comparing binaries in Linux distributions that support RISC-V and ARM.

>where combined compare-and-branch instructions could be added, and with a larger branch offset range than in RISC-V

If your argument is that ARMv8 could get better over time, I hate to be the bearer of bad news. ARMv9 code density isn't any better.

>and the worst is the lack of indexed addressing, which frequently requires 2 RISC-V instructions instead of 1 ARM instruction.

These patterns are standardized, and they become one instruction after fusion.

RISC-V, unlike the previous generation of ISAs, was thoroughly designed with hindsight on fusion. The simplest microarchitectures can of course omit it altogether, but the cost of fusion in RISC-V is low; I have seen it quoted at 400 gates.

brucehoult · on Dec 2, 2021

Instruction fusion is a possibility for the future, which has been discussed academically, but no one implements it at present. I'm not sure anyone will -- it's too much complexity for simple cores, and not needed for big OoO cores.

The one fusion implementation I'm aware of if the SiFive 7-series combining a conditional branch that jumps forward over exactly one instruction. It turns the instruction pair into predicated execution.

I agree with everything else. In particular the code density. Anyone can download Ubuntu or Fedora images for the same release for amd64, arm64, and riscv64. Mount them and run "size" on any selection of binaries you want. The RISC-V ones are consistently and significantly smaller than the other two, with arm64 the biggest.

brucehoult · on Dec 2, 2021

I'm not sure how you missed RISC-V's big feature for code density -- the "C" extension, giving it arbitrarily mixed 16 and 32 bit opcodes.

I've heard of that feature before somewhere else. It gave the company that invented it unparalleled code density in their 32 bit systems and propelled them to the heights of success in mobile devices. What was their name? Wait .. oh, yes ... ARM.

Why they forgot this in their 64 bit ISA is a mystery. The best theory I can come up with is that they thought the industry had shaken out and amd64 was the only competition they were going to have, ever. Aarch64 does indeed have very good code density for a fixed-length 32 bit opcode ISA, and comes very close to matching amd64. They may have thought that was going to be good enough.

Note: the RISC-V "C" extension is technically optional, but the only CPU cores I know of that don't implement it are academic toys, student projects, and tiny cores for use in FPGAs where they are running programs with only a few hundred instructions in them. Once you get over even maybe 1 KB of code it's cheaper in resources to implement "C" than to provide more program storage.

lucian1900 · on Dec 2, 2021

Unfortunately, variable length opcodes are a problem for wide superscalar machines, i.e. the fast ones.

ruslan · on Dec 3, 2021

Speaking about RISC-V, no it is not. In RISC-V "C" all 16 instructions have their 32 bit counterparts. When front-end reads in an instruction word (32 bits) it extracts two 32 bit ops from it then feeds them serially to decoder. So, there's only one same decoder that does the work both for 16 and 32 bit ops (basically it does not distinguish them), and that's also what makes macro op fusion possible and easy to implement, unlike ARM's Thumb which has two separate decoders with all the consequences.

dmitrygr · on Dec 3, 2021

ARM literally documented thumb as using the exact mechanism you just claimed they do not have and riscv does. Suggest reading of ARMv4T spec

ruslan · on Dec 4, 2021

I'll surely read ARMv4T specs when I will have a bit more free time, thanks :). But, ARM requires switching machine mode to select instruction set (you cannot mix Thumb with regular 32 bit), which kind of hints a selection of decoder takes place. In RISC-V, albeit it's up to micro-arch designer to choose, only one decoder is needed and you can have a mixture of 16 bit and 32 bit instructions in the program flow. What's more, with macro-op fusion feature, two consequent "C" instructions can be viewed as one "unnamed" instruction that does a lot more work. Bit more details from RISC-V authors on the subject along with benchmarks: https://riscv.org/wp-content/uploads/2016/07/Tue1130celio-fu...

seoaeu · on Dec 3, 2021

But not that much of a problem. x86 is way, way worse about variable length opcodes than RISC-V and there are plenty of fast x86 processors...

zozbot234 · on Dec 2, 2021

The thing with lack of shifted indexed addressing is that it just might not matter all that much beyond toy examples. Address calculations can generally be folded in with other code, particularly in loops which are a common case. So it's only rarely that you actually need those extra instructions.

adrian_b · on Dec 2, 2021

Shifted indexed addressing is needed more seldom, but indexed addressing, i.e. register + register, is needed in every loop that accesses memory.

There are 2 ways of programming a loop that addresses memory with a minimum of instructions.

One way, which is preferable e.g. on Intel/AMD, is to reuse the loop counter as the index into the data structure that is accessed, so each load/store needs a base register + index register addressing, which is missing in RISC-V.

The second way, which is preferable e.g. on POWER and which is also available on ARM, is to use an addressing mode with auto-update, where the offset used in loads or stores is added into the base register. This is also missing in RISC-V.

Because none of the 2 methods works in RISC-V with a minimum number of instructions, like in all other CPUs, all such loops, which are very frequent, need pairs of instructions in RISC-V, corresponding to single instructions in the other CPUs.

brucehoult · on Dec 2, 2021

A big difference here is that the RISC-V instructions are usually all 16 bits in size while the Aarch64 and POWER instructions are all 32 bits in size. So the code size is the same.

Also, high performance Aarch64 and POWER implementations are likely to be splitting those instructions into two decoupled uops in the back end.

Performance-critical loops are unrolled on all ISAs to minimise loop control overhead and also to allow scheduling instructions to allow for the several cycle latency of loads from even L1 cache. When you do that, indexed addressing and auto-update addressing are still doing both operations for every load or store which, as well as being a lot of operations, introduces sequential dependency between the instructions. The RISC-V way allows the use of simple load/store with offset -- all of which are independent of each other -- with one merged update of each pointer at the end of the loop. POWER and Aarch64 compilers for high performance microarchitectures use the RISC-V structure for unrolled loops anyway.

So indexed addressing and auto-update addressing give no advantage for code size, and don't help performance at the high end.

Iwan-Zotow · on Dec 4, 2021

"ARMv8 code density is quite good for a fixed-length ISA and is of course much better than that of RISC-V."

not true

you could compile and compare, say, gcc (cc1) of the same version on arm64 and rv. arm64 has larger binaries

dmitrygr · on Dec 2, 2021

> for some reason ARMv8 did not incorporate any lessons from that.

I used to think so too, until I asked some more knowledgeable people about it. Turns out the lesson IS that not having it is better. Fixed-sized instructions make a decoding significantly simpler, making it much easier to make very wide front ends

brucehoult · on Dec 2, 2021

A little easier, not much easier. A number of organisations are making very wide RISC-V implementations, and one has already published how their decoder works. It's modular, with each block looking at 48 bits of code (the first 16 overlapping with the previous block) and decoding either two 16 bit instructions, or one aligned 32 bit instruction, or one misaligned 32 bit instruction with a following 16 bit instruction, or one misaligned 32 bit instruction followed by an ignored start of another misaligned 32 bit instruction.

You can put as many of these modules side by side as you want. There is a serial dependency between them in that each block has to tell the next block whether its last 16 bits are the start of a misaligned 32 bit instruction or not. That could become an issue with really really wide but for something decoding e.g. 16 bytes at a time (4 to 8 instructions) it's not an issue.

There is a trade-off between a little bit of decoder complexity and a lot of improved code density -- but nowhere near to the same extent as say x86.

crest · on Dec 3, 2021

While I haven no personal experience writing aarch64 assembler code my experience with ARM v6m an v7m makes me doubt your implied insult that ARM just failed/didn't give a fuck about their instruction set. Thumb 1 and 2 are well designed instruction sets optimized for a certain kind of uarch. Almost all quirks exposed to the low level programmer are there for good reasons and while some of the constraints are a pose a challenge for compiler writers they are not beyond the capabilities of GCC or LLVM. There are several possible reasons for ARM to return to a fixed length 32 bit encoding e.g. to allow very wide OoO designs like Apple's Firestorm cores or because the gain is smaller for 64 bit code with larger constants and better served by PC relative constant pools. And while the quirky LDMIA function prologue is very flexible, appeals to me as assembler programmer and saves code space having a single instruction potentially modify most integer registers as well as change the program counter and the active instruction set is hard to implement well while easier to implement register pair load/store instructions are enough for most common instruction sequences. The tradeoff was different for in-order ARM2/3 CPUs with single ported memory and a tiny unified cache (if that).

throwaway81523 · on Dec 3, 2021

> Thumb 1 and 2 are well designed instruction sets

I had thought that Thumb 1 had serious shortcomings, which is why they ended up needing Thumb 2.

pohl · on Dec 2, 2021

Ever wonder why M1 needs such huge L1 cache? Well, now you know.

I'm not sure I follow this, but it reminds me to ask: does RISC-V allow for designs to have both efficiency & performance cores like the ARM big.LITTLE concept? Has anyone made one yet?

brucehoult · on Dec 2, 2021

Of course you can do it. SiFive has been allowing customers to configure core complexes with a mixture of different core types for years -- for example mixing U84 cores with U74 or U54. If you want to do a BIG.little thing with transferring a running program from one core type to another that's just a software thing -- and using cores with the same ISA but different microarchitecture.

To date the examples of this that have been shipped to the public have used cores with similar microarchitecture, but a different set of extensions.

For example the U54-MC in the HiFive Unleashed and in the Microsemi Polarfire SoC FPGAs use four U54 cores plus one E51 core for "real time" tasks. The E51 doesn't have an FPU or MMU or Supervisor mode. The U74-MC in the HiFive Unmatched is similar.

Alibaba's ICE SoC, which you may have seen videos of running Android, has two C910 Out-of-Order cores (similar to ARM A72/A73) implementing RV64GC, and a third C910 core that also has a vector processing unit with two pipes with 256 bit vector ALU each, plus 128 bit vector load and store pipes.

fartcannon · on Dec 2, 2021

So I guess we should expect to hear a lot of FUD about RISC-V over the coming years.

marcodiego · on Dec 2, 2021

No need to wait. Already happened in 2018: https://www.theregister.com/2018/07/10/arm_riscv_website/

https://www.extremetech.com/wp-content/uploads/2018/07/arm-r...

snvzz · on Dec 2, 2021

And it is how many learned about RISC-V's existence.

It will be a PR disaster long remembered. One for the textbooks.

jhgb · on Dec 3, 2021

I find it amusing that RISC-V allegedly creates "fragmentation risk" when platform fragmentation in the ARM ecosystem already exists and it's painful enough -- at least that's what I recall from some comparisons with the x86/PC platform with respect to Linux kernel development.

fartcannon · on Dec 3, 2021

Why do people fall for this shit? Blows my mind.

snvzz · on Dec 2, 2021

This is a real possibility, albeit a sad one.

No amount of FUD will save ARM. Only pivoting into a different business model could.

duskwuff · on Dec 2, 2021

Honestly, ARM is fine. They're no longer the only game in town, but they've still got a huge head start.

snvzz · on Dec 2, 2021

They'll be fine if they focus on their microarchitectures rather than the ISA (where IMHO they've already lost), and make the process for obtaining a license much more streamlined; I've heard it takes no less than 18 months of long negotiations to license anythin from ARM. That's not sustainable now that there's competition.

duskwuff · on Dec 2, 2021

That's already where their focus is. Most of ARM's customers are licensing specific cores from ARM, not the ISA as a whole.

klelatti · on Dec 3, 2021

> where IMHO they've already lost

Given M1, Graviton etc etc that’s a bold statement.

snvzz · on Dec 3, 2021

High performance implementations are possible even with bad ISAs, given enough resources.

x86-64 is much worse than ARM. It's a literal clusterfuck. And yet.

A high performance implementation of ARM, which is a much better ISA than x86-64, was something expected to happen sooner or later. It did not surprise me.

klelatti · on Dec 3, 2021

Fair enough but I’m still not sure why you think the Arm ISA has ‘lost’?

Teknoman117 · on Dec 2, 2021

As far as OSHW cores go, it's so very nice to be able to throw something together in verilog and be able to inherit a compiler and not be trampling on someone else's copyright...

jaas · on Dec 2, 2021

Who exactly are the customers for this chip?

ruslan · on Dec 3, 2021

The press-release does not say anything about physical chip, but a licensable core that can be used to build SoCs. Here SiFive acts same way as ARM does - sells cores.

baybal2 · on Dec 2, 2021

This is something genuinely interesting from riscv crowd for the first time

socialdemocrat · on Dec 2, 2021

Anyone able to put this in context? How fast are these cores compared to various ARM, Intel and AMD cores? At what level can they compete?

sanxiyn · on Dec 2, 2021

> With a projected score of 11+ SPECInt2006/GHz, the SiFive Performance P650 brings RISC-V into a new category of high-end computing applications.

11+ SPECInt2006/GHz is comparable to Apple Icestorm microarchitecture. Apple Firestorm microarchitecture is roughly 2x better at 22 SPECInt2006/GHz.

pantalaimon · on Dec 2, 2021

Mind you that raw core performance is not everything, memory bandwidth and caches are crucial to make sure the CPU isn't waiting for data all the time.

sanxiyn · on Dec 2, 2021

Yes, but SPECint includes all such effects. As long as SPECint benchmarks (such as GCC) are representative of your workload, it works fine.

tlb · on Dec 2, 2021

I trust that the Apple benchmarks include all such effects. I'm less convinced that the RISC-V "projections" include them. SPECint2006 is supposed to be measured with real memory and an OS. Per-GHz numbers can't accurately reflect main memory latency, since its speed doesn't scale with the CPU clock.

spear · on Dec 2, 2021

Right, and "per GHz" numbers are also not very useful because you can't just crank up the GHz when you need performance. Even with the same process technology, you can't assume different microarchitectures will max out at the same frequency.

snvzz · on Dec 2, 2021

You're right, and remarkably Apple has found a major roadblock to clock speeds while using ARMv8.

M1's L1 cache is huge, as a workaround to ARMv8's poor code density. Larger cache means lower clocks, unfortunately there's no way around speed of light.

ksec · on Dec 3, 2021

>You're right, and remarkably Apple has found a major roadblock to clock speeds while using ARMv8.

What? Where is this claim coming from?

Symmetry · on Dec 2, 2021

How impressive that number is rather depends on how many GHz they're managing. In general the slower you design your clock to clock, the faster you can make all your caches. Plus the slower you clock your core, designed in or not, the lower the number of clock cycles it takes to talk to main memory.

hajile · on Dec 3, 2021

They claim it's slightly faster than A77. That would have the IPC getting pretty close to AMD's Zen 1 chips (though probably at a lower peak frequency).

sebow · on Dec 2, 2021

If i recall correctly the sifive unmatched is still pretty slow compared to ARM( https://www.phoronix.com/scan.php?page=article&item=hifive-u... ).Now this board is not the one in question(P650) but we'll have to observe upcoming benchmarks [for which i recommend phoronix]

Obviously you can't even think about comparing it further with Intel & AMD, but when you look at the history of something like ARM(which i believe is 30-40 years old), riscv came a long way pretty fast, and the good thing it's a solid choice for the future due being open.

DeathArrow · on Dec 3, 2021

We expected new developments since ages ago.

I guess the RISC-V will conquer the desktop the same year Linux will.

sebow · on Dec 2, 2021

Sweet, are there any resources on transitioning/migrating or differences between x86_64 and riscv; or the ISAs are drastically different that it's just better to dive in head-first?

ruslan · on Dec 3, 2021

It's same as x86_64 vs aarm64. If you feel comfortable with aarm64, switching to RV64 will go without a notice.

Note, that switching to ARM from x86 is a pain, esp if you depend on proprietary software.

Epiphany21 · on Dec 4, 2021

I played with Linux (Debian) for RV64 in Qemu and the experience was pretty painless, albeit slow but I think that was more to do with emulation overhead. My personal programs I wrote myself compiled and ran without a single issue. Obviously if you need closed source software or software with build tools targeting an OS that doesn't run on RISC-V, then yeah you would have a hard time. Most open source software should be a breeze to port. A lot of the weirdness of the platform will likely be from bootloaders and drivers.

ruslan · on Dec 4, 2021

Absolutely. If you can go with OSS only, than it's easy as a breeze. Yet if you depend on proprietary stuff like I do (CADs, Wine, etc), than it's a pain. I'm currently trying to switch office to ARM (RPi 4 and Baikal-M) and that is not easy.