Hacker News new | past | comments | ask | show | jobs | submit login
Why I will be using RISC-V in my next chip (adapteva.com)
158 points by adapteva on Jan 9, 2016 | hide | past | favorite | 97 comments



Good that they're getting on the bandwagon as they're already popular with maker types. I also like the serendipity where this post led me to an equally exciting one:

http://www.adapteva.com/announcements/an-open-source-8gbps-l...

Love that they developed and open-sourced a 8Gbps, 1us, I/O interface. That might come in handy. :)


Agree! I love adapteva for a lot of reasons, and this makes me want to keep on piling the love.


[flagged]


Nope, it's now MIT license!!


I didn't check licensing for GPL because I figure you knew better on HW. Glad you people chose wisely. :)


Even though it isn't GPL anymore, how exactly would it kill all hope?


How much do you see going on in GPL'd hardware despite tools and motivation being available going back decades? Almost nothing. Takes expensive tools, limited expertise, expensive masks for prototyping... all sorts of expenses that mean it virtually doesn't happen unless cost can be recovered. Stuff is so outrageously expensive and difficult at better nodes that re-using proven (costly) I.P. is the norm.

Hence, you're going to need something that integrates with proprietary if you want them to build on it. There's still the possibility of a LGPL-style thing where at least re-synthesized or optimized versions of the open-source part must be re-released. However, forcing that for I.P. it integrates with would kill adoption by any provider unless they like operating at six to eight digit losses. Annually.


How does the GPL prevent them from making money? Sure, I have essentially the source code of someone's processor, but unless I have millions of dollars to fabricate a chip, the source's only use is to learn how things work.

Intel, if they wanted to, could release the source code of their microcode for their processors under the GPLv2 and not provide the signing certificate (like TiVo did), and all people would be able to do with it is study it. What am I gonna do with the source code of a processor? Build an emulator? I already have the instruction manual...


Main reply here:

https://news.ycombinator.com/item?id=10873839

Microcode is a separate issue that's debateable. Their CPU hardware, patents, logistics, ecosystem, and branding are real source of revenue. Microcode release might nit hurt them. However, it might contain trade secret methods for utilizing hardware in a clever way that could give competition (mainly startups) an edge. Biggest players, esp AMD, probably already RE'd it.

That said, Im for opening up microcode and even push for microcoded RISC-V. The reason is you can do all kinds of stuff from optimizing algorithms to atomic functions. There were also HLL-to-microcode compilers in the past to make it easier.

Look up PALcode on Alpha ISA if you want a nice hybrid, too.


That said, Im for opening up microcode and even push for microcoded RISC-V.

You mean a CISC-V? ;-)


Lol. Clever. I'll keep it Microcoded RISC-V until people start doing something bloated like emulating x86 in it. ;)

cough Loongson people cough


Intel has competitors. Why would Intel want to provide its competitors with what must be, at the very least, tens of millions of dollars worth of work?


You have the right idea but it's way more that you think. Intel spends over a billion dollars per quarter on all their tech.

https://ycharts.com/companies/INTC/r_and_d_expense

I agree R&D on any given bit of HW design is at least tens of millions. Probably more than once per year going back a decade given it's custom and formally verified. Standard cell approximating that would be a fraction of it but still ridiculously high if aiming for competitive performance/watt.


Yeah, Intel is one of the biggest companies in the world. I was just speculating an absolute floor for just the microcode in particular.


You're the second person to bring up microcode. I'm curious why two are focused on this rather than the HW itself. What about you?

Note: For readers not in the know, microcode is the sequence of wiring activations of the physical, closed hardware. Instructions, merely series of bits, get translated to that by the microcode engine running from a memory rather than hardwired engine. Unlike the ISA, it's highly tied to the hardware since it drives it directly.


I'm not focused on this, it was an example used in the post I responded to.


Oh OK


How does the GPL prevent those costs from being recovered?


By allowing anyone to knock off the chip without paying back the R&D. There's already a huge market of counterfeits and clones coming out of China for most things you can think of. It's even easier without reverse engineering: hand design to a fab, individual or MPW, then package and sell the chips.

Very little of the price of chips you buy is the material or packaging costs. It's mostly R&D recoup, marketing, administrative, and profit. Most of these companies get all that straight through sales of the chips. Eliminate everything but a tiny margin over physical costs and whats left for further development?


"Very little of the price of chips you buy is the material or packaging costs."

Are you sure about that? How much markup over material costs is there on most chips?


There is no typical...but here are some guidelines

1.2x (companies are either ultra efficient, copycats, suicidal, or dumping)

2x (US companies shipping volume, healthy)

4x (goal for FPGA companies and mixed signal niche)

10x (goal for companies with monopolies)

100x (one-offs, like processors for military/satellite)


I like that breakdown. That sounds about right. Especially the FPGA's and military. :)

Btw, I'd love a hardware engineeer's take on my idea for bootstrapping open hardware. The idea is HW development issues make it too costly. Led me to consider an open-source FPGA on 45nm or below with tool-chain. Let academics build (and open) the hardest stuff with non-profit integrating that with I/O and other key I.P. onto small number of chips selling it at cost.

Example open FGPA (a start, not end-goal) http://www.eecs.berkeley.edu/Pubs/TechRpts/2014/EECS-2014-43...

Example open-flow (add VPR or whatever FPGA tool is called) http://opencircuitdesign.com/qflow/

Idea is we get an easy FPGA set up to allow targeting with open or commercial tools. Its specs are open for review and enhancement. There is a via- or metal-configurable S-ASIC co-developed that matches its architecture. Like with eASIC, people can do an inexpensive conversion if they choose. Even considered using people they turned down in their contest (but whose stuff worked) to build the tooling for that. ;)

So, you have cheap-as-possible 45nm or below FPGA's being cranked out with more I.P. and tooling all the time. S-ASIC conversion is inexpensive due to matched architecture. If designed for it in mind, ASIC conversion wouldn't be too difficult either for FPGA-proven designs given staff and tooling. The HW cost is essentially some important cells, NOC, key IP (eg USB, PCI, DDR), memories, some analog, and at least one integration. With MPW's & academic support, it might cost a fraction of what it otherwise would. Not going to be EDA or HW competitive with Xilinx or Altera but not $600-$2,000 a chip w/ closed bitstream either.

My main worry outside market demand is effect of patent suits, esp if they demand royalties per unit. Otherwise, I'm hoping combo of academics, existing tooling, open FPGA's, and S-ASIC's is a solid combo for a start on widely-accessible, open HW. Still getting feedback from HW pro's before I start knocking on doors, though.


Before answering...what do you see as the problem today and what is your goal?


A few problems I'm trying to knock out in one solution. That's always risk but FPGA's are versatile. Here's a few:

1. Subversion and security concerns of all the black boxes in hardware alleviated with open I.P., tooling, and HW. Main concern in my field.

2. Both ASIC and FPGA entry cost to startups trying to get custom stuff out the door. Need to lower it.

3. Hobbyists and academics experimenting currently benefit from FPGA's with some HW and SW ecosystems built around them. I see that expanding with easy-to-use, open tooling. Plus maybe decent HLS system. Synflow is one I know in that niche.

4. I've been pushing potential of FPGA as accelerators for cloud, networked, and HPC applications for a while. There's an uptake in their use but cloud is cost-sensitive. FPGA's w/ OpenCL-style tooling sold at cost in boards pluggable into regular servers or desktops could... well, you know the potential because you're in that market & brave enough to compete on architecture. ;) Pico Computing also comes to mind with their FPGA desktops.

5. Maybe make HW prototyping, verification, etc cheaper as another source of revenue as I've seen some ASIC-oriented products that use FPGA's for this. I know many of the tools are too slow, too, so there's potential for straight acceleration as in 4.

6. Reduce cost (and runtime risk) of FPGA-proven components with S-ASIC conversion. Also thought about anti-fuse but think S-ASIC is legally & physically safer. Not a HW expert, though, so going on second-hand comments there.

So, there's at least 5 potential benefits from an open-source FPGA sold at cost. Especially on performance if people start cramming them into boards like Pico and Maxeler do with cloud company orders. Good on trustworthy, hobbyist, and prototyping ends with low cost & open, expanding SW side.

So, that's the idea. Pretty ambitious and with high failure potential for sure. It's why, if I attempt it, I'll be enlisting cloud companies and academics. AMD's "semi-custom" business seems to indicate some potential if it's integrated into a CPU's memory bus or a CPU itself. Core logic and stuff only needs to be done once on one good node. Subsequent integrations much cheaper.


You can get lattice FPGAs for a few dollars each from distributors. Zynq's in volume are reportedly as $15/each. Tools are free.

If cost is the main concern, then how low do you need to go?

An open FPGA arhitecture is a good thing, but it will require something akin to the RISC-V effort to make it happen. (ie major DARPA/NSF, academic, industry backing to bootstrap). Let's see what happens...


No, just a student doing a masters thesis can create an open FPGA:

https://www.eecs.berkeley.edu/Pubs/TechRpts/2014/EECS-2014-4...


I linked to that above. The conversation moved on because, as your paper noted, what they created lacks a lot of the functionality and even core elements of commercial FPGA's. I agree academics can do most or all of it but it will take more than one at Master's level to be competitive.


Subversion and amount of logic are main concerns. Lattice aint Virtex6 last I checked. I agree on RISC-V/DARPA comparison though.


I'm not sure on most chips. I do know if you look up MOSIS, etc prices you'll find that most of the cost of the prototyping run is the masks. The masks start at tens of thousands then just go up from there. High-end nodes are in millions to tens of millions. Every prototype run of your chip takes a new mask. Hence, all the money spent on proprietary tooling to reduce odds of digital or physical defects. Many ASIC's still need respins. Then, there's 3rd party stuff like I/O packages.

Now, compared to that, the silicon wafer might cost a few grand with packaging (not volume) costing $10-20 a unit. For any OSS project, dare I say most of the cost will be in the first paragraph and recouping it will take significant unit prices. How significant depends on what's being sold and at what volume. A PIC or AVR clone at modest volume will be cheaper than an Oracle T2 clone at low volume. Hope that helps.


100% sure - once you have the masks, pumping out chips is cheap, if you do it in volume.

http://www.adapteva.com/andreas-blog/semiconductor-economics...


Key quotes:

"Super complex SOC platforms like Apple’s Ax family or the IBM Cell very likely exceeded $1B in total development costs and involved thousands of engineers in aggregate."

"Meanwhile relatively simple SOCs like the Epiphany family of chips were designed for less than $3M over the span of 3 years.[1,2]"

And Adapteva did such a good job being cheaper than average they wrote a [different] blog post about how they went about it. So, the low end was $3 million over 3 years while users wanting to ditch proprietary are asking for price/performance/watts closer to the above quote that cost $1 billion. Hence, my prediction it will take millions to tens of millions to make something acceptable. Also, me pushing Cavium's model of RISC multicore + HW accelerators for common things. I think it will help by taking load (& optimization) off the main CPU's.

Now, people doing analog stuff have it a lot easier given even 350nm is often good for that. That stuff can get really cheap:

http://www.planetanalog.com/author.asp?section_id=526&doc_id...


How does the GPL apply to hardware? Let's say I take some GPL'ed verilog code, modify it a bit, build a chip out of it and sell the chip. Am I required to provide the verilog "source" for my chip? I guess if the chip counts as "object code" in GPL lingo then the answer is yes. There's a lot more than just the verilog code that goes into building a chip though. How is the physical layout, for example, handled? Is that considered part of the "corresponding source" in GPL lingo?


I found an interesting and detailed legal analysis of that. It sounds like the answer isn't black and white:

http://jolt.law.harvard.edu/articles/pdf/v25/25HarvJLTech131...


I was hoping someone jumped in for context. The other two commenters provide that. Now, in that context, let's just say that the ASIC ecosystem with companies sitting on hundreds of millions to billions in I.P. aren't about to (a) integrate a GPL component or (b) let their stuff be used in a GPL component if there's even the slightest chance it causes legal risk.

That's all the good ones. The only exception I know (and promote) was Aeroflex Gaisler putting an ASIC-proven CPU and peripheral IP in a GPL release. That people did nothing with. (sighs) Even he's not doing GPL for new stuff IIRC. The rest continue licensing on a per-producer basis... revenue which GPL would negate.

So, it's just incentives. The positive incentive is licensing, which they're doing. The negative incentives are anything risking their I.P., which GPL does. It's a double whammy against GPL'd or even free HW from existing ecosystem.


Copyright doesn't apply to physical hardware but it probably applies to hardware designs. There are various laws protecting ICs but I'm not sure they allow for copyleft-like hacks.

https://en.wikipedia.org/wiki/Integrated_circuit_layout_desi...


That's awesome! RISC-V seems like a well defined ISA that allows for lots of extensions. Seems like a perfect fit for a board like the parallella.


How does RISC-V get around the issue of patent mine fields? For example we've seen its really difficult to make a competitive codec without infringing and even when it's accomplished there is lingering doubt until it's challenged. I hope this situation is somehow much different.


I am sure the RISC-V core team has a much better answer. Their mission statement is a good start.

https://www.eecs.berkeley.edu/Pubs/TechRpts/2014/EECS-2014-1...

The good news is that RISC has been around since the early 80's so any killer hardware patent will likely have expired by now. Note that the RISC-V ISA is separate from any hardware implementations. If you get fancy with your micro-architecture you can definitely trample on one of the 1000's of micro-architecture patents filed across the industry.


This is why they have a RISC-V foundation, made up of big and small companies (Google, Oracle, HPE, etc.). Most (all?) of the RISC-V ISA is stuff that is out of patent and has a lineage going back more than 20 years. This issue is definitely important and it's not being ignored!


Patterson says they've tried their best to make sure it doesn't infringe on patents they know about but they obviously can't provide any guarantees.


I'm surprised to see a lot of RISC proponents still around, because I think it's quite clear that things didn't quite work out the way they thought it would --- the vision of cheap, simple, high-performance CPUs just didn't happen. Thus I'm not of the opinion that another "MIPS, but free" architecture is such a good idea.

Benchmarks are controversial but by most measures the RISC-V performance should be “good enough” for most use cases

In my experience, statements like this really mean "it's actually really bad, but we don't want to say that"... after all, who would call benchmarks "controversial" if they were winning? There do tend to be few comparative benchmarks, but here's one that shows how ARM and x86 are competitive, but MIPS is behind in energy efficiency:

http://www.extremetech.com/extreme/188396-the-final-isa-show...

Even modern ARM cores, an architecture that started out being very RISC, use microinstruction-based translation in their front end and probably have more in common with Intel's microarchitectures than MIPS and the like.

I think a "CISC-V" could be more interesting - something x86-like (AFAIK the remaining patents are only on the Pentium and above instruction set extensions, and those may expire soon, so 486 and below are now public-domain ISAs) with a small dense instruction set, but extended in a different direction. "x86, but free". ("ARM, but free" would likely not work for legal reasons.)


Market success != technical merit. However, if you're willing to consider ARM a RISC (well, almost) then there's your market success. Alpha completely dominated the performance segment for years, but they fell behind for reasons that has nothing to do with CISC vs. RISC. At the time, x86 compatibility mattered more than it does today.

For people designing their own cores for whatever reason (there can be many, research and commercial), RISC-V is very attractive because of all the reasons already given, but it's obviously not the only option.

RISC-V is carefully designed compromise; it scales down to extremely cheap cores and up to superscalar. Like Alpha before it, extreme attention has been paid to avoid features/choices that would be bad for OoOE implementations. Some examples:

- rs1, rs2, rd fields are always in the same location and all register sources and destinations are explicit (makes decoding faster and you can start fetching/renaming without having decoded)

- there are no branch delay slots

- instructions produce at most a single result

- no condition codes etc (dependencies are explicit)

- the sign-bit for all immediate fields is in a fixed location (cheaper sign-extension)

and so forth.

I'm also a big fan of the conditional branch instruction which unlike the Alpha can compare two registers.


However, if you're willing to consider ARM a RISC (well, almost) then there's your market success.

ARM wasn't really a pure RISC from the beginning (e.g. multicycle instructions like LDM/STM, pre/post-increment addressing modes, built-in shifts), and I'd say the market success, especially more recently, is attributable to the fact that ARM cores are becoming more x86-like.

On the other hand, MIPS is the quintessential RISC, and hit as enjoyed some success, but was never really known much for amazing performance or efficiency. I suspect RISC-V will be similar, and all the cheap Chinese tablets/phones/etc. that are currently using MIPS may switch to RISC-V instead, although many are unlicensed clones so cost may not be a factor to them.

Alpha completely dominated the performance segment for years, but they fell behind for reasons that has nothing to do with CISC vs. RISC.

Alphas relied primarily on clock frequency to achieve their performance, and were very power-hungry as a result. This is absolutely the RISC philosophy of performance via simpler designs and increasing clock frequency, which stopped being viable long ago. A 200MHz 21064 can do 0.675DMIPS/MHz and consumes 30W, or 0.0225DMIPS/MHz/W; a Pentium (P5) 100MHz has 1.88DMIPS/MHz and consumes 10W, for 0.188DMIPS/MHz/W.


This is getting off-topic, but ARMs success had nothing to do with RISC vs. CISC. It was that _Alpha_ engineers took their skill to the ARM ISA (which at the time had only very weak implementations). They created StrongARM which made people sit up and take notice. Intel ironically inherited the technology and continued developing it under the XScale name. So, yes, Intel in part created their own ARM problem.

I feel I have to repeat again that none of this had to do with RISC vs. CISC. It had to do with implementation and market muscle.

"Alphas relied primarily on clock frequency to achieve their performance, and were very power-hungry as a result"

So was the P4 and it failed, but Intel had enough capital to take a power efficiency clue from Transmeta and turn the ship 180 degrees.

What killed the Alpha was three things: delays in getting the new models out, the cost of staying in this game, and of course, the arrival of Itanic.

EDIT: An aside: the 64-bit extension of ARM is a completely new and different ISA called AArch64 which incidentally is a lot more RISC-like than the original ARM. IOW, it looks like the ARM designers disagree with you.


What are you qualifications in computer architecture? You're arguing against pretty much everyone in the field.

> I'm surprised to see a lot of RISC proponents still around, because I think it's quite clear that things didn't quite work out the way they thought it would --- the vision of cheap, simple, high-performance CPUs just didn't happen. Thus I'm not of the opinion that another "MIPS, but free" architecture is such a good idea.

You seem to be implying that there's RISC failed. If you ask anyone who mattered in computer architecture ranging from academics like Patterson who literally wrote the most used books in the field to Intel fellows, they'll all tell you that the essential point the RISC folks were making was proven right. This point being that RISC is requires less logic (=area,power) to implement, it's easier to write compilers for, processors have faster cycle times, and so on.

In the end though, it turned out the market valued software compatibility over performance. That, good marketing and a phenomenally efficient supply chain and fab resulted in Intel winning. But don't confuse this with CISC winning or RISC losing. If Intel had to do a clean slate design today, they'd do RISC themselves. In fact, Intel have a few proprietary microcontrollers hidden inside their chipsets and SoCs that were designed post-2005ish and these have RISC architectures.

> In my experience, statements like this really mean "it's actually really bad, but we don't want to say that"... after all, who would call benchmarks "controversial" if they were winning? There do tend to be few comparative benchmarks, but here's one that shows how ARM and x86 are competitive, but MIPS is behind in energy efficiency:

The benchmark performance depends only on how much effort is put into optimizing the implementation. Intel have on average maybe 200 architects and 200 more designers tweaking their processors for performance for the last 30 years. It is completely absurd to expect a team of 10 or so grad students to compete with them.


You seem to be implying that there's RISC failed.

It did fail to deliver on all its promises.

RISC is requires less logic (=area,power) to implement

True, but ultimately it is overall energy usage that is important. A tiny low-power CPU with less performance will take longer to complete a task than a larger faster higher-power one, meaning it consumes more energy. That's what the article I linked to shows.

processors have faster cycle times

That's not necessarily a good thing, as Intel's failed NetBurst microarchitecture shows. Making a CPU with such short delays that it can run at upwards of 10GHz is futile, as power dissipation becomes a huge problem long before that.

In fact, Intel have a few proprietary microcontrollers hidden inside their chipsets and SoCs that were designed post-2005ish and these have RISC architectures.

If you're referring to the ARC4 in the Management Engine, I have a feeling that was chosen for reasons other than being RISC or otherwise.

It is completely absurd to expect a team of 10 or so grad students to compete with them.

I remember it said that RISC would be so simple and performant that such small teams could easily design CPUs which vastly outperform the big CISCs at much lower prices, so I don't think it's that absurd of an expectation.


The reason they're dancing around the performance figures is because a) synthetic benchmarks like SPEC and CoreMark are of questionable value, and b) They don't have chips in customers' hands quite yet, so they can't be independently reproduced.

If you look at the current results for Rocket and BOOM, you'll find that these microarchitectures seem competitive with Cortex A5/A8 and Cortex A15 respectively at comparable frequencies.

In addition to this, they seem to have some significant amount of space savings on die. Power/performance ratios already seem to be better (likely due in part to smaller die area).

On a power-performance front, a bigger OoO RISC-V could likely be competitive in peak performance to an intel, though you'd have to find a market for that. There has also been a lot of discussion about a real vector machine extension, and that would likely wipe the floor with x86's packed SIMD if done right.


Much more importantly, an Intel design team has something like 1200 engineers working for 2-2.5 years on the next generation of an existing processor. I don't know about ARM, but I'd guess they probably something like a team of 200-300 people on working on each revision.There's no way a team of 10 or so grad students can hope to out-optimize them.

Also SPEC isn't a synthetic benchmark, it contains real applications like bzip2 and gcc.


The problem with SPEC is it is a 1990's workstation benchmark suite... should my cell phone processor's design be guided by how quickly it can compile ia32 code? Or simulate quantum computers?


Sure, that's fair. I was just pedantic about the precise meaning of a "synthetic" benchmarks, which is a benchmark generated to mimic real code but isn't a real/useful application by itself.


But aren't modern Intel and AMD CPUs internally RISC(-like)? It seems like we have great compilers for CISC targets, and can make great RISC CPUs that process CISC instruction sets really well?

Perhaps the issue is that the internals of eg. modern Intel CPUs are locked up at Intel, so the missing step from CISC assembly to efficient RISC is missing (while we've got high-level/C/C++ to CISC covered)?


No they aren't internally RISC. On big Intel & AMD cores "ADD reg, mem" translates to a single uop. On Intel Atom even "ADD mem, reg" translates to a single uop. In fact, it is mostly legacy instructions (e.g. binary-coded decimal arithmetic) that are translated into multiple uops.


x86 uops are very wide, much wider than RISC instructions and more like VLIWs; P6 uops are 118 bits and I haven't easily found any description of the newer models but they are likely even wider.


The latest number I have is 157 bits in Intel Core (for a ROB entry).


Thinking about it, it is probably not difficult to create a x86 subset that is user mode compatible with most modern programs but lacks things like segmentation.


Just one question, is it pronounced "risk five" or "risk vee?"


Five.


Can RISC-V be implemented in the on-board Xilinx Zynq 7020 FPGA of the A101040 (Epiphany III) Parallella?


Yes, definitely. Berkeley's open-source Rocket[1] core can already be programmed onto the Zedboard and ZC706[2], which also use Zynq 7000 series FPGAs. Getting it to work on the Parallela should just be a matter of changing the clock constraints and pin settings and finding a configuration that will fit on the FPGA.

[1] https://github.com/ucb-bar/rocket-chip [2] https://github.com/ucb-bar/fpga-zynq


not to mention the smaller implementations like picorv32 and zscale/vscale.


Neat-o! Thanks :)


Definitely! Straightforward, especially if you use https://github.com/parallella/oh


Oh! Sweet!


Why do you want to do that? Isn't there a hard ARM core on the Zynq 7000s ?


Site seems slow and might go down: mirror

http://i.imgur.com/0FmTYcH.png

(Fullpage screenshot as Google cache lacked styling)



I'm still not sure whether RISC-V will be one of main stream microprocessor in the industry field.

1. Still the core is not yet solid.

2. Not much powerful debugging tools exist.

3. Chisel itself. Every engineer who is willing to use RISC-V should understand output of chisel source code in order to do ECO or some other low level jobs.

4. BSD itself. It is strong and also weakness. There's no way to merge revision into main repository. It will not be quite trouble if RISC-V remains as it is. But will be a big problem if RISC-V evolves by its own development team.

I hoped this RISC-V becomes mainstream processor since I attended the lecture from Mr.Yunsup Lee 4 years ago. I really want to say "I was wrong at that moment" after all.


There's a distinction between the RISC-V ISA and our open source processor implementations. The ISA is an open standard that anyone can make an implementation of. There are many RISC-V implementations out there. Most of them are not based on our RocketChip codebase and use Verilog, not Chisel.


Chisel? There are dozens of non-Chisel RISC-V cores out there already (and many more that are closed source and being used by companies like Bluespec and Rumble).


There's no way any of this will remain "free" though. Established players won't allow it, you'll have to license their patent portfolios to make anything at all.


Does anybody have a cached version of this? It keeps returning db errors.



Could you also please make an open FPGA, based on this design?

https://www.eecs.berkeley.edu/Pubs/TechRpts/2014/EECS-2014-4...


I had the pleasure of meeting Professor Ranjit Manohar of Cornell a few years ago. He was the founder of Achronix, an FPGA startup. He said:

-only 2% of the standard FPGA fabric does useful work..

&

-the really expensive part to develop is the IO (too many standards)

Having an open FPGA is a a good start, but it will only be truly useful if the IO and the optimization is brought up to date with modern fpgas. Today's fpgas are really structured asics.

Very much for an open fpga, but let's not underestimate the investments made by fpga companies to get to the current state of the art. Building platforms is very expensive [ref mythical man month]


SIMD and VLIW are the future of microprocessors, and unfortunately it doesn't seem like this ISA will be able to support them.


RISC-V has a vector mode that can be used for SIMD applications.

VLIW has been the future since the 80s, and we're still waiting for the magic wonder compilers that can actually spit out efficient VLIW code. Even GPUs have abandoned VLIW (AMD TeraScale) in favour of RISC (Nvidia, AMD GCN).


VLIW DSPs are in every phone. VLIW CPUs are almost certainly a bad idea and as to GPUs, AFAIK VLIW needs to coexist with barrel threading there which might create problems. But VLIW certainly has its place.


Most DSP archs came out of thr 90's and many of the cores today are a reflection of that trend (ceva etc). I worked on the TigerSharc DSP for 8 years and can tell you that from an implementation standpoint they can be a nightmare! Not sure they do have a viable place long term from an economical perspective.


In DSPs, sure, but not in general-purpose CPUs or GPUs.


I suspect the DSP makers just use VLIW because of interia. They probably don't have the money or incentive to revisit their old decisions.

Also wouldn't you say that most of the stuff that used to be implemented on DSPs is now moving into ASICs? I wouldn't be so sure that VLIWs are going to be around forever.


VLIW is many times cheaper (in terms of power and area) than OoO. It is not going anywhere from the low budget range.


But VLIW and OoO aren't the only two design points. The renewed interest in traditional inorder vector processors. In any case, I do think the point of DSPs was to be area/power efficient for certain specialized algorithms, and a lot of these are just becoming ASICs/specialized accelerators today.


nVidia Kepler and Maxwell (i.e. the two latest archs ATM) use VLIW instruction encoding


Do they? I can't find anything about it. VLIW for GPUs is always only mentioned in the context of AMD TeraScale – which has been obsoleted in favour of a RISC architecture five years ago.


I guess that's because unlike AMD, nVidia doesn't officially document the GPU instruction set. But if you disassemble .cubin with nvdisasm, you'd see that code of Kepler/Maxwell is organized in bundles of 4/8 words, where the first word doesn't encode any instruction. Here is what Scott Gray of Nervana Systems, who developed a native assembler for Maxwell, write about it[1]: "Starting with the Kepler architecture Nvidia has been moving some control logic off of the chip and into kernel instructions which are determined by the assembler. This makes sense since it cuts down on die space and power usage, plus the assembler has access to the whole program and can make more globally optimal decisions about things like scheduling and other control aspects. The op codes are already pretty densely packed so Nvidia added a new type of op which is a pure control code. On Kepler there is 1 control instruction for every 7 operational instructions. Maxwell added additional control capabilities and so has 1 control for every 3 instructions."

[1] https://github.com/NervanaSystems/maxas/wiki/Control-Codes


There's a chapter in the current RISC-V manual that explains how you could make a RISC-V-like VLIW ISA. But VLIW is most certainly NOT the "future". VLIW demands an even "mix" of instruction types, and that's largely incompatible with general-purpose application code.

And the concept of baking into your ISA what the designer believes is the "perfect functional unit mix" is an anti-pattern. What's the perfect mix depends on the benchmark, and it changes from basic block to basic block. A history of failed VLIW projects can attest to this. A dynamic superscalar is far superior, even in power-efficiency.


VLIW demands an even "mix" of instruction types, and that's largely incompatible with general-purpose application code.

The Itanium is a great example of this. Benchmarks that pushed the CPU to its limits looked amazing, but real-world performance with general-purpose code was awful. There just isn't enough parallelism in general-purpose code to justify extremely huge instruction bundles that will mostly be wasted with NOPs.


"VLIW demands an even "mix" of instruction types,and it's hugely incompatible with general-purpose application core."

Has anyone ever done a study / experiment of a VLIW with multiple hardware threads and how that would impact the need for an even mix?


The point of VLIW is to eliminate the hazard detection and scheduling logic required by normal superscalar, but SMT would require much of that logic to be added back. So VLIW can't really use SMT. VLIW could use FGMT (e.g. barrel processor) or SoEMT, but those can't fill the empty issue slots caused by bad instruction mix.


The studies I've seen have shown that the FU mix is heavily skewed and changing on every basic block, particularly when you offload the DLP to a more efficient vector/SIMD unit.

I'm not sure I see how MT would solve the mix problem, if each thread gets an issue cycle (and each thread itself has a bad mix).


I was just thinking that the the bad mixes might on average fill in the gaps to what the processor actually had for resources.


Even GPUs, which have arguably far more predictable instruction mixes, struggled massively to get a useful utilization on VLIW architectures – AMD tried two different ones from 2006 to 2011 before they finally gave up on the concept and started using RISC architectures like Nvidia had been using all the time.


By what I've been told, VLIW makes only really sense in some use cases such as DSP processors. With general purpose computing it happens way too often that you can't find enough instructions that are independent of each other. For the case where it is possible to execute instructions in parallel, you can make your CPU superscalar. The simple nature of the RISC-V ISA probably should make supercalarability easy and performant.


We already have a RISC-V superscalar out-of-order core, the Berkeley Out-of-Order Machine (BOOM).

https://github.com/ucb-bar/riscv-boom



Interesting project; no updates for a year. I wonder how the project is going?


We've recently released a couple of tech reports on Hwacha:

Hwacha Vector-Fetch Architecture Manual: https://www.eecs.berkeley.edu/Pubs/TechRpts/2015/EECS-2015-2...

Hwacha Microarchitecture Manual: https://www.eecs.berkeley.edu/Pubs/TechRpts/2015/EECS-2015-2...

Preliminary Evaluation Results: https://www.eecs.berkeley.edu/Pubs/TechRpts/2015/EECS-2015-2...

M.S. Thesis on Mixed Precision in Hwacha: https://www.eecs.berkeley.edu/Pubs/TechRpts/2015/EECS-2015-2...


I think my biggest concern is big blue, I'm sure they will have an eagle eye on this and make sure that as soon as someone accidentally adds support that falls close to one of their patients it will stifle the energy and progress of the community. The further they slide into the abyss the more desperation will drive moves like this... But only time will tell.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: