Hacker News new | past | comments | ask | show | jobs | submit login
RISC-V Pushes into the Mainstream (semiengineering.com)
183 points by PaulHoule on Dec 23, 2022 | hide | past | favorite | 127 comments



Im a big supporter of RISC-V doing my best to buy the hardware where possible and use it.

The biggest hindrance to it is lack of software and distro support.

The VisionFive2 is a lovely board but so far we only have a barren bootrom. No idea of OpenGl, OpenCl work on it!

Someone lied about distro support being confirmed. StarFive probably...

But those lies could torpedo the whole project.

The early days of android were bogged down with complaints of fragmentation, patchy support.


>The biggest hindrance to it is lack of software and distro support.

https://debian.starfivetech.com/

Ubuntu is working on it as well. May already be done I'm not sure

As for software see RISC-V Summit 2022—StarFive's Efforts in Fuelling RISC V Software Ecosystem[0]

There is a lot of work being put into this, not only by StarFive but by other companies, the Chinese government (via the Chinese Academy of Sciences), researchers and hobbyists (once boards arive).

[0]https://youtube.com/watch?v=VfA2HLibOOY


just booted up in debian on the visionfive 2 just this night (flashed to an sdcard from that very same link above)

though 4k monitors dont work yet from hdmi... seems some kind of bug...


Let me guess : issues coming from recent monitors expecting DRM ?


I've heard that the first rev of the boards have some signal integrity issues that should be fixed in later revs.


>The VisionFive2 is a lovely board but so far we only have a barren bootrom. No idea of OpenGl, OpenCl work on it!

It is the first raspberry-pi like <$100 RISC-V SBC, and it only started shipping to customers last week.

It is also the first board with a large run (long thousands vs short hundreds).

Maybe there's just one Debian image and little documentation on how to get it all up and running, but that's today.

In a month, it'll be a different story. That's the whole point of getting these into the hands of developers.


We're actually waiting for VF2 boards to be delivered and then we'll have a go at porting Fedora 37.


The star64 (from pine64) might have a better chance of being well-supported


With the same SoC, it should be about identical support-wise.

It just launched, give it time. The whole point of VisionFive 2 is precisely to get these boards to developers in large numbers so that it can all be bootstrapped.


It's been a while but nothing from pine64 was well supported without hacky 3rd party custom kernel rolls and the like, with no apparent effort to move stuff into mainline.

I hope that's changed?


Look at the PinePhone, Pinebook Pro, and RockPro64. All have mainline support and you can run several distros on them. I consider them the gold standard in this space, although it's thanks largely to the community and not the actual company. Anything newer than RK3399 will probably take a few years to become very good, but I'm sure mainline support is in the cards so long as the device is popular enough and Pine64 can sell enough. (Their PineCube IP camera device was a failure in recent years) postmarketOS also does a lot in this space. Their developers often help to improve support for everyone on a board.


I got them all when they came out (trying to help support the concept, even if execution wasn't quite there yet). I haven't used them in a while, because it was such a shitshow. Glad to hear things have gotten better, might need to dig them out of the closet.


Is there a decent hardware manual even for the first VisionFive, say to the extent the original BeagleBoards had one (even if the Pis never did)? I wanted to write a (slow) emulator that could at least boot the Ubuntu image into text mode, but I have absolutely no idea how it all hangs together beyond the SoC docs. For that matter, the page for the Unmatched doesn’t have a link to a hardware manual either.


I have an ESP32-C3 and it just works!

Sadly I haven't found much new it enables yet either (because at this point Xtensa is quite well supported too).


Realistically, it doesn't really enable anything new. It's largely a replacement for the ESP8266. It has a few more features of course, but I think the point of early RISC-V products is that they're comparable to existing ARM (or other ISAs) products in their respective spaces.

I like that the ESP32-C3 is a RISC-V chip, but I realized after playing with it for a week that it turned out to not mean much - that's probably because I'm a pretty simplistic user though.


A few years ago the only Wifi MCU you could use Rust on was an obscure Realtek one - because it had an ARM core. Having a standard core would've been a great benefit back then, but it seems like nowadays all the tooling you need has been ported to their current architecture as well.


If I go through my parts box it's not hard to find Wifi modules that require a substantial amount of config and in some cases a solid understanding of how WiFi works. And they were 10s of dollars a unit. I'm delighted that modules like the ESP8266 showed up and made it easy and shockingly cheap. There was a real windfall moment when Arduino syntax was ported to the 8266, and Espressif has really taken advantage of it.


Several other ARM-based wifi MCUs existed from the major semiconductor companies (TI) and other competitors (Nordic Semi).


Apple brought their chipset in house, using ARM is a base, to control their supply chain, cut costs, and allow for a RISC based architecture (which honestly just makes sense now these days). The biggest hinderance to RISC-V adoption will be if ARM remains customizable, cheap (from an IP perspective), and has widespread adoption. Honestly, I'm okay if RISC-V remains somewhat niche, but provides the necessary market pressure to keep non-open competitors honest. There's a meta-benefit to open standards as well.


ARM is already attacking one of their largest licensees, and RISC-V is already in use at Apple. As Apple’s hardware accelerated x86 translation proves, they can proficiently transition to any hardware platform. The ISA really isn’t that important anymore, and I expect Apple to do what makes the most sense regarding price/performance.


Well switching to a new architecture is still probably extremely expensive which would permanently shift price/performance in favor of ARM unless RISC actually provides something unique.

Freedom from licensing etc. shouldn't be a problem for Apple unless maybe if they want to sell their chips to 3rd parties which doesn't seem too likely.


Apple has a perpetual ARM license. It's unlikely they'll transition to RISC-V.


My suspicion is they'll fork the arm architecture at some point (either via a legal agreement with arm or just taking the deploy the lawyers approach and doing it unilaterally). Apple like their walled gardens and closed eco-systems so there's no clear reason for them to switch to RISC-V whilst they're very happy with their arm eco-system.


I do not know the details of Apple's agreement with ARM. AIUI they are not public.

But I do know they have a perpetual free RISC-V license like everybody else.


I worked at a semiconductor company that had an ARM architectural license and we implemented our own ARM CPU. We still had to pass ARM compliance tests.

I heard from some of the people on the CPU team that Apple did not have to pass those tests because they had an exemption from ARM. Apple was one of the original ARM investors when it was spun off from Acorn in 1990. I think that agreement allows them to do things with ARM like adding custom things that other ARM licensees aren't allowed to do.


> We still had to pass ARM compliance tests.

You are making it sound like a burden, but I do not understand why it would be. Surely you are interested in safety checks giving you more confidence that you are positioning yourself to take advantage of the Arm ecosystem (instead of releasing something that is not supportable by compilers and OS experts)?.. Or am I misunderstanding the nature of the compliance tests?


I think the compliance tests were making sure that your add instruction or memory load instruction followed the spec.

You mention ARM ecosystem and that is precisely the point. Apple controls their ecosystems. How do you write apps for the iPhone? You use Apple's development environment with Apple's compiler. If Apple decided NOT to implement an instruction for some reason they could simply make their compiler never output that instruction.

I worked on chips that ran embedded applications with no ability for ordinary users to change the software. What is the value of meeting an external ARM controlled spec for that?

I also worked on chips that only ran Android and nothing else. If you are also the company porting Android and writing all drivers for your own platform then people may argue whether it is worth only being 99% compatible.

Later I worked on chips where people may run Android, Windows Mobile, or plain Linux on this chip using GCC, Clang, Microsoft's compiler, or whatever. For that you definitely wanted to comply with specs.


In a long time horizon, it's probably in Apple's interest to migrate away from ARM just so that they have one less company to negotiate with or receive restrictions from


Apple helped to design Arm v8 they probably have huge influence on the direction of travel of the ISA. Plus they have huge investment in Arm based software and probably get access to all of Arm’s IP (eg big.LITTLE).

They will migrate when there is a major performance advantage (as in the past).


All it would take is for AWS/Azure to start offering a RISCV Graviton-like system and I suspect we could see development skyrocket.


I've had the same thought and took it even further. The cloud companies are in the business of managing capital, namely datacenters. They are not in the business of designing µarchs; to the extent that they do (ie. Graviton) it is to lower costs. For this reason I belive it is economically favorable for them to jointly create and fund an open µarch. Note thay they are already doing this through a middleman known as the Intel/AMD monopoly. Also note that a similar thing has happened already with Linux. Collectively throwing a few hundred million at the problem (or buying sifive, for example) could save money in the long run. Although I can't be certain about that.

I hope someone pitches this idea to these companies. I think some people at Google may be thinking in this direction being that they are paying for tape out of open hardware at no cost to the developers and are investing in tooling.


That's what Alibaba is already doing too. They bought T-Head and have been releasing open source RISC-V cores that have been taped out in a move to "commoditize your complement", thus bringing total costs down by making their vendors' markets more competitive.


Graviton isn't a custom uarch though. It's a set of fairly standard cores with all the actual magic being the fabric and and peripherals.


Agree and that would be the result of a lot of development.


ARM committed the sin of suing a really big customer of theirs, so there is likely some exit-seeking all around the industry


I’m not sure what else Arm is supposed to do if it believes that the really big customer has broken their contract with them?

I actually think this ought to be a positive for customers who are clearly abiding by the terms of their contracts.


It doesn’t matter which party is correct, its now added risk that ARM could be the bad actor


Why if Arm is correct does that add a risk that it's the bad actor?

Most firms will have legal departments can look at this case on its merits and decide whether or not Arm is acting in bad faith.

If I'm another Arm customer I definitely don't want a competitor playing fast and loose with its Arm contract.


> Why if Arm is correct does that add a risk that it's the bad actor?

Because nobody wants to fight over the licensing of a product they're shackled to. If you could buy a RISC-V board with software support similar to a Raspberry Pi, ARM's goose would be cooked. Every enthusiast would ditch ARM in a heartbeat for a more open ISA, and ARM licensees would see it as an opportunity to finally wiggle free of ARM's insane license restrictions. All we need is the software support, which should be pretty forthcoming since most projects have already been optimized for RISC.

ARM could unseat x86 because both ISAs were encumbered with licenses at the time. Now, ARM is competing with much less restrictive architectures, and all it would take is a FOSS RISC instruction set to ruin their value prop.


> of ARM's insane license restrictions

You're assuming high end RISC cores will be available for free and or under much more favorable licensing than ARM cores. Which seems unlikely, why would someone sell their cores to a competitor for less than a company whose only business is designing them?


Same reason why people would contribute to an open source compiler and toolchain and distribute for less than the cost of the old school paid compiler vendors like Borland. These contributors aren't really in the business of selling compilers and simply have strategic reasons to drop the floor of that market as much as possible. The same applies to RISC-V cores, with probably the most prominent example being Alibaba/T-Head and their open source cores.


Sound more like another "this is the year of Linux"..

Also hardware is pretty different than software. Hardware design (and obviously manufacturing) requires much more significant investment.

> These contributors aren't really in the business of selling compilers and simply

That's the thing. Apple was/is not in the business of selling compilers, nor are the people/companies who contributed to gcc and most other open-source projects. They have basically nothing to lose and a lot to gain from contributing to open-source software.

It's not obvious to me CPU design could work that way. For starters everyone who could develop advanced RISC-V cores is unlikely to give them anyway to their competitors since they would be in the business of selling CPUs. i.e. do you really think Qualcomm, Apple etc. would make their designs free? Why?

> Alibaba/T-Head and their open source cores

Because at this point they have more to gain than by keeping them proprietary. RISC-V is not yet overall competitive with ARM. So it's not like these RISC-V cores could be commercialized that successfully. Keeping them open probably makes further development faster.


> Because at this point they have more to gain than by keeping them proprietary. RISC-V is not yet overall competitive with ARM. So it's not like these RISC-V cores could be commercialized that successfully. Keeping them open probably makes further development faster.

This open source core is about equivalent to a Cortex-X1. https://github.com/OpenXiangShan/XiangShan A collab between Alibaba, Tencent, and the Chinese Academy of Sciences.


And the kicker: It is open source.


> Because nobody wants to fight over the licensing of a product they're shackled to.

So when a firm licenses a RISC-V core from SiFive they should be free do whatever they want with that core irrespective of the license terms?


No, but they have the freedom to design their own core if SiFive threatens them in the way ARM does.


So you believe that firms should abide by contracts but that firms shouldn’t try to enforce the terms of those contracts.

I find it odd how Qualcomm is portrayed as though they were innocently minding their own business when they suddenly got sued. They bought Nuvia knowing what was in the two sets of contracts.

Also Qualcomm had and still has the ability to design their own Arm cores.


> So you believe that firms should abide by contracts but that firms shouldn’t try to enforce the terms of those contracts.

No one is saying that they should or shouldn't (or at least I haven't seen anyone saying this).

But, when you exercise a right, there are social consequences. Freedom of speech is a right (in the US at least), but picketing funerals will still get you dis-invited from a lot of parties. Suing big customers (unless you're obviously in the right when viewed from the outside) will make at least some people more nervous to do business with you.


I'm not talking about Qualcomm, really. I'm talking about companies like Apple, who really only have a cursory attachment to ARM as an ISA. Then there's the hundreds of smaller manufacturers who have zero attachment to ARM and would much rather build hardware on their own terms. Those are ARM's moneymakers, and those are the companies that frankly have the most to gain from using RISC-V.

If Qualcomm is a relevant topic regarding ARM's success, then they've arguably already failed.


Does Apple pay anything meaningful to ARM? They're a founding member with (I believe still) a significant stake. I find it doubtful they didn't secure themselves a perpetual license when they founded the company, and that seems to be what the Internet believes to be true.

I'm not so sure it's the expensive large chips, made in relatively small quantities, that make ARM the most money, do you have a source? I'd have guessed they actually make more on the billions of small ARM cores that ship every year that end up by multiples in pretty much every device with a battery or power cord. And these, I think, are at the biggest risk of leaving ARM. RISC-V development is mature enough at this end of the market that it's relatively easy for users to transition, there are multiple competitive cores on the market, and there's no concern here about backwards compatibility because these are embedded systems where there's usually not an expectation of having to run user code at all. It will be much harder for the likes of Qualcomm where there's a huge ecosystem built around their ARM processors - but as a share of cost-per-processor, they probably stand to gain the most. Qualcomm is a founding member of the RISC-V foundation after all.


> They're a founding member with (I believe still) a significant stake

No, ARM is 100% owned by SoftBank. Apple sold their shares a while ago.

Overall ARM doesen't really make that much money, especially compared to some of their clients. It's not even obvious to me if would make sense financially even for Qualcomm to design their own RISC-V cores compared to licensing from ARM.

I mean they could and did design their own cores but they are still using ARM designed ones for their top-end chips.


> would much rather build hardware on their own terms

Most of these firms are licensing a core from a third party. There is no such thing as ‘on their own terms’.


Not Apple or many of the smaller manufacturers. Even still, the ones who do want to license core designs still have the option to do so with RISC-V.


> smaller manufacturers who have zero attachment to ARM and would much rather build hardware on their own terms

Or they believe that designing their own cores would be cost prohibitive and prefer licensing them from ARM.

It's also not obvious to me than licensing conditions for RISC-V could be more favorable than what ARM is offering. Why would you anyone even license competitive cores their competitors if they can make more money selling them themselves?


Completely agree with this and with your other comments in this thread.

There simply isn't that much money in designing cores. The money is in selling SoCs or devices. Arm at least has made increasingly high quality cores available at reasonable prices to all comers. A future where Arm's business is made unviable is not necessarily better for consumers.

You can license RISC-V cores from Si-Five today but they if reports are correct they were in discussions to sell to Intel. If that were to happen who knows what would happen to their offering. There will be others of course but as you say it's not obvious that what they offer will be any better that what Arm offers today.

And for the foreseeable future Arm is immune to takeover by any of its deep-pocketed customers.

I'm very happy that RISC-V exists from a number of perspectives but there needs to be a realistic assessment of its potential impact.


The Intel sale didn’t go through


You’re right but the point is that Intel was very interested. If you’re relying on cores from firms that are attractive to SoC / device makers then there is no guarantee those cores will continue to be available post acquisition.


The truth is, it doesn't matter on HN.

Facebook, Oracle, and Qualcomm are all with Original Sin. You will hardly find any support regardless of its situation.


It’s not obvious who is in the right, and in my armchair opinion it looks like ARM has a slightly weaker argument. IANAL, but all of this makes ARM appear more litigious than egregiously wronged.


Apple is in a unique position with a perpetual ARM license, so they don't feel any particular competitive pressure from RISC-V. In fact they're adopting it for microcontrollers and such.


Apple had a job listing last year for something which needed RISC-V knowledge.

https://www.tomshardware.com/news/apple-looking-for-risc-v-p...


It's common knowledge their are planning to replace some of their low-end embedded cores with RISC-V ones.

RISC-V is probably 5-10 years behind on the upper end though


I've only coded for ARM. On those devices, the toolchains etc handle the code logic, and each ARM device will have its own peripherals.

Working at the MCU-register level and higher, is there any difference in RISC-V? Or does it only matter if writing compilers and toolchains?

Maybe the way you handle interrupts changes? Eg you would no longer use a cortex-m library to handle them. Maybe low power modes too, since those use ARM registers in addition to MCU-specific ones?


Basically, No, IMO. I do embedded systems. Most of the important differences, would be covered under the peripheral side. Assuming you move to a drastically different ARM chipset, there would be no difference.

As disappointing as it may be to some people, architecture rarely winds up mattering much (although you might be able to argue that some elements that make it into the peripherals are "architectural"). At least for software developers, there will never be a reason to dig into architecture. Maybe for some stuff interrupt, or hardware exception related.

I've found that the specific chipset and compiler tend to be more important for software developers working close to hardware. Even ASM code probably won't really matter, unless you plan to heavily rely on it; I would guess it could come into play if you plan on writing heavy ASM code, which I doubt you'd ever do on a modern architecture... maybe you're REALLY trying to save memory, or think you can improve CPU performance for some reason. Seems far fetched in almost every application, but I bet people out there are doing it for good reasons.

Of course, there may always be little places where you end up doing stuff with ASM, but again IME they tend to be where the peripherals and architecture meet.


Most of the time it's only an issue if you're writing assembly.


It is fairly different from a systems programming perspective. The base instructions (I) are essentially everything you'd expect when writing any kind of regular program and it feels very normal and natural for anyone who has ever written assembly, but once you start needing fancier things like exceptions you'll see a lot of new RISCV specific design choices. This is sort of to be expected, x86 has a different exception architecture from ARMv8A. It's just different, not necessarily less capable. Odds are whatever RISCV MCUs come on the market will eventually be supported with the same SDKs you know and love, but they will have a different implementation for all the system specific functions.


Riscv seems really cool. It's come so much farther than I would have guessed and seems to have a bright future.


I really like the idea of RISK-V, open firmware, open source, etc.

What makes me a hypocrite to some degree is addiction to devices like my Apple Watch. The freedom to not carry a phone and still get texts and phone calls is awesome, and also be able to check weather, etc.

As I “get more retired” I am so tempted to drop the entire Apple ecosystem that drives my work and content creating work flow and just chill with a Linux laptop for recreational learning projects.


Does anyone know of a good wiki for doing multicore RISC-V on FPGA? Something more substantial than:

https://www.reddit.com/r/RISCV/comments/z6xzu0/multi_core_im...

When I got my ECE degree in 1999, I was so excited to start an open source project for at least a 256+ core (MIPS?) processor in VHDL on an FPGA to compete with GPUs so I could mess with stuff like genetic algorithms. I felt at the time that too much emphasis was being placed on manual layout, when even then, tools like Mentor Graphics, Cadence and Synopsys could synthesize layouts that were 80% as dense as what humans could come up with (sorry if I'm mixing terms, I'm rusty).

Unfortunately the Dot Bomb, 9/11 and outsourcing pretty much gutted R&D and I felt discouraged from working on such things. But supply chain issues and GPU price hikes for crypto have revealed that it's maybe not wise to rely on the status quo anymore. Here's a figure that shows just how far behind CPUs have fallen since Dennard scaling ended when smartphones arrived in 2007 and cost/power became the priority over performance:

https://www.researchgate.net/figure/The-Dennard-scaling-fail...

FPGA performance on embarrassingly parallel tasks scales linearly with the number of transistors, so more closely approaches the top line.

I did a quick search and found these intros:

https://www.youtube.com/watch?v=gJno9TloDj8

https://www.hackster.io/pablotrujillojuan/creating-a-risc-v-...

https://en.wikipedia.org/wiki/Field-programmable_gate_array

https://www.napatech.com/road-to-fpga-reconfigurable-computi...

Looks like the timeline went:

2000: 100-500 million transistors

2010: 3-5 billion transistors

2020: 50-100 billion transistors

https://www.umc.com/en/News/press_release/Content/technology...

https://www.design-reuse.com/news/27611/xilinx-virtex-7-2000...

https://www.hpcwire.com/off-the-wire/xilinx-announces-genera...

I did a quick search on Digi-Key, and it looks like FPGAs are overpriced by a factor of about 10-100 with prices as high as $10,000. Since most of the patents have probably run out by now, that would be a huge opportunity for someone like Micron to make use of Inflation Reduction Act money and introduce a 100+ billion transistor 1 GHz FPGA for a similar price as something like an Intel i9, say $500 or less.

Looks like about 75 transistors per gate, so I'm mainly interested in how many transistors it takes to make a 32 or 64 bit ALU, and for RAM or DRAM. I'm envisioning an 8x8 array of RISC-V cores, each with perhaps 64 MB of memory for 16 GB total. That would compete with Apple's M1, but with no special heterogenous computing hardware, so we could get back to generic multicore desktop programming and not have to deal with proprietary GPU drivers and function coloring problems around CPU code vs shaders.


A k-LUT SRAM-based FPGA where k=6 is something like 100x more inefficient in terms of raw transistor count than the equivalent ASIC gates when I last did some napkin math (though there's a statistical element in practice when considering k-LUTs vs fixed netlists.) But the SRAM requirements scale 2^k with the LUT size, so the highest you get in practice today is 6-LUTs and 80% of vendors do 4-LUT. And then you need all the transistors for the scan chain to configure each bit, actual block RAM, fixed function DSPs, etc. Clocking them to really high frequencies is also difficult. They're mostly overpriced because the market is small and the only competitors in town can rob you; bulk buys from having costs 1/50th the list-price, per-device, isn't unusual.

But the big problem isn't really the hardware. You can go create and simulate a robust design in a number of HDLs, there are all kinds of tools to do it now, though they aren't as fancy as proprietary ones. It's doable. But it's having a good software stack that matters. You can blow a billion dollars on a chip and still lose because of the software. Nvidia figured this out 10 years ago and everybody else is still playing catch up.

And it takes an insane amounts of software engineering to make a good compute stack. That's why Nvidia is still on top. Luckily there are some efforts to help minimize this e.g. PyTorch-MLIR for ML users, though more general compute stack needs, such as accelerated BLAS or graph analytics libraries or OpenMPI, are all still "on you" to deliver. But if you stick your head in the sand about this very important point, you're going to have a bad time and waste a lot of money.


A 100× transistor count would amount to basically 6 to 7 doublings of Moore's law. Or 10× in nm device lengths. So once the inherent difficulties of designing a chip for trailing-edge ASIC are addressed (with better free EDA tools and such) it seems that FPGA-based commercial products (as opposed to use of FPGAs for bespoke prototyping needs) should become quite uncompetitive. There's also structured ASIC, multi-project wafer, etc. design approaches that are sort of half-way, and might provide an interesting alternative as well. OTOH, FPGA's might also be more easily designed to integrate pre-built components like CPU cores, and the 100× rule wouldn't apply to such parts if used in an FPGA-based design.


100x transistor count is a big enough difference that you would want some sort of ALU and branching/FSM/loop unit arranged in an array with a few fpga elements on inputs and outputs.

It sounds to me that the real problem is still that the ideal programming model hasn't been created yet.

What would it look like? Compile a C function into an FPGA pipeline? A dataflow language where you explicitly define processes and their implementation?

I mean imagine if you could take a mathematical formula, how would it translate into a series of additions, subtractions and multiplications? You could write it down in Fortran and then you would have a well defined and ordered tree of operations. Do we just translate that tree into hardware? Like, you have 100 instructions with no control flow and you just translate it into 100 ALUs? Does it make sense to reuse the ALUs and therefore have 100 instructions map to less than 100 ALUs?

If we assume the above model, then there are specific requirements for the hardware.

What if need more than one algorithm? Can I switch the implementation fast enough? Can the hardware have multiple programmed algorithms for the same ALUs?

Programmable ALUs sound awfully close to what a CPU does but in theory you would just have a register file with say 16 different ALU configurations and the data you are sending through the ALUs is prefixed with a 4 bit opcode that tells the ALU which configuration to use. We are getting further and further away from what an FPGA does and closer to how CPUs work but we still have the concept of programming the hardware for specific algorithms.

These are just random thoughts but they reveal that the idea of a pure FPGA is clearly not what the accelerator market needs.


>And it takes an insane amounts of software engineering to make a good compute stack

That isn't actually true. It takes commitment to the platform. My AMD GPU from 2017 had its ROCm support dropped in 2021. Very nice. That sends a signal that ROCm isn't worth supporting by application developers. In other words, AMD decided that there are ROCm and gaming GPUs whereas with Nvidia there is no distinction.


What sort of wiki are you envisioning here? There is some decent tooling and docs around generating SoCs [1] but, as the article mentions, the most difficult part is not creating a single RISCV core but rather creating a very high performance interconnect. This is still an open and rich research area, so you're best source of information is likely to just be google scholar.

But, for what it's worth, there do seem to be some practical considerations why your idea of a hugely parallel computer would not meaningfully rival the M1 (or any other modern processor). The issue that everyone has struggled with for decades now is that lots of tasks are simply very difficult to parallelize. Hardware people would love to be able to just give software N times more cores and make it go N times faster, but that's not how it works. The most famous enunciation of this is Amdahl's Law [2]. So, for most programs people use today, 1024 tiny slow cores may very well be significantly worse than the eight fast, wide cores you can get on an M1.

[1] https://chipyard.readthedocs.io/en/stable/Chipyard-Basics/in...

[2] https://en.wikipedia.org/wiki/Amdahl's_law


The problem isn't that algorithms are inherently sequential though but rather that parallel programming is a separate discipline.

In single threaded programming you have almost infinite flexibility, you can acquire infinite resources in any arbitrary order. In multithreaded programming you must limit the number of accessible resources and the order is preferably well defined.

In my opinion expecting people to write parallel algorithms is too much, not because it is too difficult but rather because it has to permeate through your entire codebase. That is a nonstarter unless the required changes are not unreasonable.

The challenge then becomes, how do we let people write single threaded programs that can run on multiple cores and gracefully degrade to being single threaded the worse the code is optimized?

I don't have the perfect answer but I think there is an opportunity for a trinity of techniques that can be used in combination: lock hierarchies, STM and the actor model.

There is a unit of parallelism like the actor model that executes code in a sequentially single threaded fashion. Multiple of these units work in parallel, however, rather than communicating through messages, STM is used to optimistically execute code and obtain a runtime heuristic of the acquired locks. If there are no conflicts, then performance scales linearly, if there are conflicts, then by carefully defining a hierarchy of the resources, you can calculate the optimal level in the hierarchy to execute the STM transaction in. This will allow the STM transaction to succeed with a much higher chance which then eliminates the primary downside of STM: performance loss due to failed transactions whose failure rate creeps up the more resources are being acquired.

A lock hierarchy could look like this: object, person, neighborhood, city, state, country, planet.

You write an STM transaction that looks like single threaded code. It loops around all people in the times square and would thereby acquire their locks. However, if that transaction was executed on the object level, it would almost certainly fail because out of thousands of people only one needs to be changed by another transaction to fail. The STM transaction acquired a thousand people, therefore it's optimal level in the hierarchy is the neighborhood. This means only one lock needs to be acquired to process thousands of people. If it turns out that the algorithm needs information like the postal address of these people, it is possible that some of them are tourists and you therefore acquire resources that are all over the world, you might need to execute this transaction at the highest level of the hierarchy for it to finish.

The primary objection would be the dependence on STM for obtaining the necessary information about which resources have been acquired. This means that ideally, all state is immutable to make software transactional memory as cheap as possible to implement. This is not a killer but it means that if you were to water this approach down, then the acquired resources must be static, i.e. known at compile time to remove the need for optimistic execution. That still works and it lets you get away with a lot more single threaded code than you might think, especially legacy code bases.


Shame that OpenSPARC T1 was a dead end. And Open MIPS. etc..

RISC-V is maybe finally something that will stick because it can't be killed by a stupid company.


You could do a trial build of an in-order Rocket RISC-V core [1] to see how much space it takes up.

[1] https://github.com/chipsalliance/rocket-chip


As a high level language programmer, I look forward to understanding my computer all the way down i.e. including the ISA which is not happening on amd64.


MIT 6.004 was one of my favorite classes. I think you can probably find it on OpenCourseWare. Over the course of a semester, you work your way up from very basic transistor physics to basic logic gates to a DEC Alpha AXP-inspired CPU (but 32-bit instead of 64-bit, the program counter is one of the numbered registers, and the high bit of the program counter determines if it's in kernel or userspace mode) that you've designed at the logic gate level. There are test suites that run in a simulator (jsim), where the kernel includes emulation for multiply and divide instructions in the illegal operation trap handler... so as long as you've correctly implemented the most basic instructions, you can run some fairly complicated software in emulation.

In the class, there were competitions for the highest benchmark scores (in emulation) and highest ratio of benchmark score to gate count (again, in emulation). The high performance implementations end up having 5 stage pipelines with bypass networks to avoid having to wait for intermediate results to hit the register file. I remember even having the carry logic in my ALU alternate the sense of the carry at every level up the tree in order to save one gate delay at each level. My year, I don't think anyone implemented register renaming, but I wouldn't put it past some students.

I would love to see 6.004 switch from Beta to RISC-V and also for the Stanford compilers class to switch from MIPS to RISC-V.


Some of my education in this is from a book called Inside the Machine which I really enjoyed. I noticed I was able to easily understand the earlier architectures but at some point they got really complex and I didn’t find it accessible. I took a look at amd64 and it was like.. haha no. I’m sure I could learn it if I was spending every day with that layer but I’m not. From what I’ve seen of RISC I’m hopeful I can retain it even going long intervals without looking at it.


For understanding what's going on all the way down to the ISA level, I'd say something like a PIC microcontroller is even easier since they tend to have only around 50 instructions, though RISC-V has the advantage that it could scale all the way up to potentially desktop CPUs some day


TSMC is not going to give up their trade secrets, sorry.

Also, if you are using a high-end RISC-V (none on the market currently), then the implementation is going to be proprietary and decidedly non-trivial.

I'm not sure in which sense you expect to "understand your computer".


This is a surprisingly snarky/nasty comment "sorry"

> I'm not sure in which sense you expect to "understand

How about this? https://riscv.org/technical/specifications/ If there are a few extensions buried in the chip that doesn't materially change what I said.


The equivalent manuals are available from Intel and AMD.


The issue isn’t the manuals existing it’s what they describe. Basically the R in RISC.


Well, you could go buy and read Fundamentals of Semiconductor Fabrication (I did). This should give you understanding comparable to reading The C Programming Language. Yes, it is shallower understanding than what you would get by writing a C compiler yourself or working at TSMC, but it definitely is still understanding.


And you can definitely understand amd64 on an ISA level, which is the publicly documented "high" level interface.


I didn’t find it accessible.


Lets go !

After decades of open-source software, finally we have open-source CPU.


Not yet. With a couple of exceptions, the RISC-V CPUs described in the article are all closed source.

RISC-V is an open instruction set, so anyone can implement their own design using it, often with their own proprietary extensions.

Because the instruction set is open, there are indeed hundreds of open source RISC-V designs out there, because basic CPU design is fun, easy and educational.

However, at the moment, nearly all physical RISC-V CPUs you can actually buy, dev boards etc, and devices containing them, use closed source CPU designs.


I wouldn't quite say that's the case. Two of the three full Linux capable RISC-V SoC releases this year are using open source CPU cores. The BL808 and the Allwinner D1 both use T-Head CPU cores that are available on GitHub https://github.com/T-head-Semi/openc906 . The JH7110 in the VisionFive2 and Star 64 does use a closed CPU core however.


Look forward to LM4A (Sipeed, Q1). That's C910, even more capable. Far above Cortex-A72. Quite capable.

Not bad at all for an open source core.


Ibex is open source and has taped out - https://github.com/lowRISC/ibex


What's the elevator pitch for RISC-V? My intuition is that RISC was dead for good reason.


"Permissionless development" - want to develop a RISC-V core? Just download one from github or design your own from the freely available specs. No legal agreements to enter or licensing fees to pay. Very different model from Arm (sign a legal agreement before you start and pay for licensing) or x86 (you want a license? LOL)

RISC is hardly dead. It more accurate to say it won so comprehensively that it's pervasive. The design principles are now used in every significant chip (including x86).


Licensing innovation - The open licensing model, anyone can make a RISC-V implementation, there's some rules/costs involved if you want to use the RISC-V trademark with it: https://riscv.org/about/risc-v-branding-guidelines/ but certainly a lot cheaper and permissive than arm's architectural licensing model.

This enables far more competing RISC-V implementations, with more choice and more ability to modify things to your needs.

On the downside you could see eco-system fragmentation and lots of subtle incompatibilities causing issues for broad software support.

Ultimately anything technical happening is secondary to this, this is the reason RISC-V is taking off, not anything to do with technical superiority (it simply needs to be good enough).


"Good enough" is an incredibly big thing.

RISC-V doesn't have to displace M2/Epyc/Xeon to be important. There are a trillion cores a year being shipped in embedded devices where "cheap and configurable" is more important than raw performance.

What surprises me is that we haven't seen more "built to emulate" designs yet. Many of those "trillion cores" are zombie architectures with terrible price/performance ratios, but nobody wants to rewrite their software or retool things (anything 8051, AVR or PIC seems vulnerable). I could see replacement modules using a cheap-due-to-scale RISC-V core that just emulates the old chip in baked-in firmware.


>RISC was dead

Dead how? I don't get it.

They're literally everywhere, dominating the numbers both shipping right now and cumulative.

Every new ISA of any significance in the last 3+ decades is RISC.

The only pre-RISC legacy ISA in use is x86, and it is only losing market share.


> The only pre-RISC legacy ISA in use is x86, and it is only losing market share.

And for many generations now, x86 machines are basically RISC processors with a CISC frontend.

Empirically it seems that CISC has 'failed' as a way to design processors, and it's better to let the compiler do that job when you're building a general purpose computer.


That is a meme. Even RISC processors use uops. uops are often wider and more complex than the ISA instructions that they are derived from.

The reason for that is that a lot of features in a CPU instruction are just the result of toggling some part of the CPU on or off so having four one bit flags is better than encoding the same value in two bits. What this means is that you can have more possible instructions available on the uop layer than on the ISA layer. When that is the case you can hardly call the internal design a "RISC" processor. Especially when the ISA wars were specifically about ISAs and not microarchitecture. Even if we say that uops are RISC instructions that still is an argument against RISC ISAs because why bother with RISC as an external interface if you can just emulate it? Your comment seems rather one sided.


Also the x86 was also the most riscy-ish of its cohort (ld/st architecture, single simple addressing mode etc) whether that was uncanny design looking forward decades, a result of Intel's internal architectural history, pure luck people still argue

Other designs from the same time followed on from the wunderchild of the time - the Vax which everyone loved and wanted to emulate.

The big change of the time though was changes in memory hierarchy, caches pushed closer to CPUs (eventually got pulled on-die) which favoured less densely encoded ISAs more registers and instruction sets that didn't require a full memory access for every instruction.

In my professional life time we've gone from 'big' mainframes with 1MHz core cycle times (memory cost more than $1M/megabyte) to those Vax's (actual silicon dram!), to what's sitting on my lap at the moment (5Ghz 8/16 cores 64Gb dram etc).

I don't think CISC 'failed', it was simply a child of it's time and the constraints changed as we moved things from giant wirewrapped mainframes with microcode to minimise ifetch bandwidth, to LSI minicomputers to VLSI SOCs with multimegabyte on-chip caches


Well, to be fair, this was easy to predict. RISK was only created because CISC did already fail at that time; people were already letting their compilers do the job and left the specialized instruction basically unused.


In hindsight, sure, it seems obvious, but I don't think it was that obvious that CISC performance wouldn't end up scaling. What wasn't obvious, to me anyway, is that even with all of that complicated decode, register renaming and whatnot, these CISC processors managed to stay competitive for so long. Maybe it's just the force of inertia, though, and if there'd been a serious investment in high-performance RISC machines in the 2000s, x86 would've been left in the dust.


Oh, I'd say it is still not really obvious. Sure, RISC has a complete win now, but there is no reason to thing that the ultimate architecture for computers won't be based on complex instructions.

We even had some false starts: when vector operations were first created, they were quite complex, then they were simplified into better reusable components; also when encryption operators started to appear, they were very complex, then they broken into much more flexible primitives. There is nothing really saying that we will be able to break down all kinds of operations forever.

But still, I wouldn't bet on any CISC architecture on this decade.

Anyway, the reason x86 lasted for so long was Moore's law. This is patently clear on retrospect, and obvious enough that a lot of people called it forward since the 90's. Well, Moore's law is gone now, and we are watching the consequences.


Won't CISC architectures always have the benefit of being able to have dedicated silicon for those complex instructions and thus do them faster than many smaller instructions? I understand RISC-V does instruction fusing, which provides a lot of the same benefits, but I'm surprised ARM gets around this.


Dedicated silicon for custom instructions is now quite favored actually, because of the well known "dark silicon" problem. I.e. most of the die actually has to be powered down at any given time, to stay within power limits. Hence why RISC-V is designed to make custom, even complex instructions very easy to add. ("Complex" is actually good for a pure custom accelerator, because it means a lot of otherwise onerous dispatch work can be pulled out of the binary code layer and wired directly into the silicon. The problem with old CISC designs is that those instructions didn't find enough use, and were often microcoded anyway so trying to use them meant the processor still had to do that costly dispatch.)


The simpler your instructions, the more use-cases they have. But yes, the less benefit you get from specializing them too.

If you can break your specialized instruction into a few much more generic ones, it's probably a gain.


> The only pre-RISC legacy ISA in use is x86, and it is only losing market share.

The only pre-RISC legacy ISA in wide use is x86, but AFAIK the legacy ISA from IBM mainframes (s390/s390x), which is still supported by mainstream enterprise Linux distributions, is also a pre-RISC one.


Dead simple base ISA and easily extensible means it's an obvious choice if you want a custom processor with your own custom instructions.

Scalable so the same ISA (with/without certain extensions) can be used from the smallest microprocessor to high end CPUs.

Compressed instructions gives very high code density (and unlike ARM Thumb doesn't require weird mode switching and is available for 64-bit ISA)

It's not the first popular open ISA. There was OpenRISC before it, but it was fatally flawed (branch delay slots). So RISC-V is arguably the first good popular open ISA.


There is also OpenPOWER (under the Linux foundation) which I assume is also good?

Or is RISC-V stictly better than it?


OpenPOWER simply wasn't open, it was simply marketing. Only years after RISC-V started to grow did they change their licensing.

RISC-V is more modular and and by now has far more support behind it.


OpenPOWER is also an open, royalty-free ISA. You're right that PowerPC and Power ISA didn't start open, but they weren't called OpenPOWER.


Even when it was called OpenPOWER it was not as open as RISC-V. Its only much later that it became so.


I have several OpenPOWER systems, including the POWER9 I use as my usual desktop. Besides IBM and other server manufacturers like Tyan and Wistron, you can get them as Raptor workstations and servers.

If you want an OpenPOWER design to play with, look at Microwatt ( https://github.com/antonblanchard/microwatt ) which is complete enough to boot Linux.


>My intuition is that RISC was dead for good reason.

ARM cores run most phones, routers, etc, and it's an acronym for Acorn RISC Machine.


Last I checked, most hard drives contain ARM cores, so some x86 machines actually contain more ARM cores than x86 cores. An ICE car likely has one or more ARM cores managing the engine and one or more ARM cores running the entertainment system.


Yeah but technically the brand has moved on from being an acronym. Now it is just Arm or "arm". There is no ARM.


Not dead at all, its now a (conceptual) rocket ship

No ISA/architecture licensing fees, no restrictions on what you can do with it - you can build an open source core or you could sell your design to others. ARM/x86 doest have this, its all locked up behind lawyers.


In addition to the permissionless model as mentioned, another consideration is to standardize all the boring common bits. RISC-V incorporates a lot of industry learning about how to handle extensions, compression, etc. If a company is considering RISC-V vs a proprietary design for an accelerator or such, getting debugged versions of those features in the ISA vs whatever gets cooked up in house in crunch mode could be pretty big.


Isn't ARM a RISC? So no, far from dead, RISC is already hugely successful.


ARM, MIPS, Power are all RISC. The only real CISC processor in mainstream usage seems to be the PICO, and of course, the x86 and amd64 are currently RISC processors with a CISC interpreter in front of them.


>the x86 and amd64 are currently RISC processors with a CISC interpreter in front of them.

CISC or RISC is a characteristic of the ISA.

Usage of microops doesn't change the ISA. It is an opaque microarchitecture artifact.


This is a lost battle. People don't even remember what the word ISA means. It is an external interface. Whatever happens inside the professor has nothing to do with interfaces. You can run x86 code on Arm Mac's, does that mean they are a x86 processor that is internally implemented with RISC? If yes then the word has lost it's meaning. If no then you would have to explain why these are distinct cases.


Some argue it has actually become a CISC (special instructions for JavaScript and so on), but I don't know how RISC-V compares.


RISC is not about the number of instructions but rather what the instructions do. The famous example of CISC gone to its logical extreme is the VAX's polynomial multiply instruction, which ended up being almost a full program in a single instruction. RISC tends to go the other way, focusing on things that are easy for hardware to do and leaving anything else to software.


ARM's JavaScript instruction is actually a pretty good example of RISC philosophy. The instruction is a single cycle floating point conversion to integer using x86 abi default rounding modes rather than the rounding modes in the flag register. It's a great example of the RISC filtration test of 'what's a single cycle instruction that can be noticed in instruction traces as something that will give an actual perf improvement'.


Compressed instructions (https://en.wikipedia.org/wiki/Compressed_instruction_set) negate one of the "good reason"s




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: