RISC creator is pushing open source chips

reitzensteinm · on Aug 20, 2014

Assuming Moore's Law is slowing down, and there's no rabbit about to be pulled out of a hat by Intel or others, it's going to close the gap significantly between what is possible on an $x billion and an $x million budget. This makes it a great time for initiatives such as this to take place.

In 2001, an unexpected delay of 18 months could mean that you release to competition running at 2ghz instead of 1ghz. These days, Intel's 32nm, 22nm chips (and 14nm, from what is known about Broadwell) are more or less identical from a consumer's perspective.

As an aside, Patterson's book (coauthored with Hennessey), Computer Architecture: A Quantitative Approach, is one of the best I've ever read. Right up there with SICP.

DCKing · on Aug 20, 2014

Well there's two predictions you can think about that could happen sometime in the future:

1. Nobody can put more than XXX transistors on a square millimeter (where XXX is some physical limit to chip material)

2. "Everybody" can put YYY transistors on a square millimeter (where YYY is a number sufficiently close to XXX so that the difference does not matter much)

The first means some stagnation in the chip market: not only does Moore's Law not apply anymore, but not even does a weak version of Moore's Law ("chips tend to get faster/cheaper over time") apply. But if that scenario becomes true, it doesn't necessarily mean that the democratization of the second scenario becomes true. The ability to put YYY transistors on a square millimeter could still be limited to a select few companies investing billions in the production of these chips. After all, how many fabs worldwide are currently capable of producing 45nm chips, which could be considered "commodity" [1]?

Is it really true that it becomes much cheaper to produce chips on smaller process nodes over time? Sure it becomes cheaper since less original R&D is necessary, but is it reasonable to think that producing 20nm chips ever becomes a question of investing anything less than, say, 100s of millions? Does the second scenario ever become true, at least in the forseeable future?

I don't know much about chip fabs so I wonder if someone more knowledgeable could share some insight on this.

[1]: I genuinely don't know the answer to this, but I don't suppose it's very widespread?

cben · on Aug 20, 2014

If 2 keeps not happenning — if producing custom ASICs remains expensive — it becomes increasingly attractive to produce cheap as-good-as-they-get FPGAs in massive quantities.

Ironically, FPGAs as endgame would be way more disruptive than an "ASICs to the people" endgame.

FPGA density, price/gate and power efficiency trail "native" hardware but constant factors of overhead but it should be possible to reduce the gap compared to now. More FPGA volume means better economies; the hardware architecture could be optimized further; most importantly IMHO the current design tools are heinously unfriendly and could benefit a lot from programmer attention (once programmers realize a more-or-finalized FPGA structure is the "new assembly" to optimize for).

I'm not sure if a constant factor of disadvantage would become very acceptable (because we'll drop the throw-faster-hardware-at-it mentality) or very unacceptable (because robots with FPGA brains always lose at high-frequency chess wrestling to robots with native brains).

tachyonbeam · on Aug 20, 2014

Would ideally need to have some kind of open source FPGA catch on (and be manifactured with competitive performance). Companies like Xilinx and Altera are not at all keen on open sourcing their technology and tools. IMO, through their actions, they're shooting themselves in the foot, keeping FPGAs from becoming what GPGPU is right now (or something even better, in fact).

Symmetry · on Aug 20, 2014

If I were to bet on which fabs were able to get to the final process node for traditional silicon transistors, my money would probably be on Intel and TSMC, since they have the most volume on advanced processes and spend the most money on new fabs and process research. And TSMC is the biggest commodity fab.

The reason it becomes cheaper is that your development costs for both the process and the design get amortized over longer and longer periods. At the same time architecture becomes the only way that some chips can be faster than others.

Though I guess I should point out that progress in chip performance isn't the same as putting more transistors on a square millimeter. One aspect of this is that more economical silicon processes such as TSMC's are better at cramming more transistors into a square millimeter than Intel's, but Intel's transistors tend to have better drive current - all at a given process node. The other aspect is that ever increasing transistor leakage means that processors might get to an era when they can't afford to keep all their transistors lit up at the same time[1].

[1]http://hopefullyintersting.blogspot.com/2012/02/dark-silicon...

aidenn0 · on Aug 20, 2014

Don't count IBM out either; they have certain chips with such high margins that they can stay fairly leading edge even with smaller volumes.

mechagodzilla · on Aug 20, 2014

These days, Intel is an unusual model in that they own the whole 'stack', from processor design to fabrication, packaging, etc. To replicate that costs 10s of billions of dollars.

To get your own chip produced using a 'pure-play' fab (which, these days, basically means TSMC, and to a lesser extent Samsung and GloFo), however, is much cheaper. If you want a 20nm chip, you design your chip (basically the cost is just engineering effort + CAD tools..a few million $$), send it to TSMC (getting masks made is another few million $$), and they send you back chips.

Reality is slightly more complicated, in that you're using using an integrator to manage the relationship with TSMC, the packaging company, the testing company, etc., but a startup with $10-20 million could probably scrape together a decent 20nm chip.

Once TSMC has built a new fab though, the longer it is 'relevant' the cheaper they can make the wafers, as they have more wafers to amortize the ~$10 billion or whatever it cost to build the fab over.

DCKing · on Aug 20, 2014

It's a confusing discussion (for me). It is true that "chip design" is somewhat democratized in that you don't need 100s of millions to design a chip and have it made on a competitive process. The thing I pondered was whether it will ever cost less than 100s of millions to make it yourself on a competitive process.

Although I guess the latter really isn't that relevant when it comes to the discussion of chip design.

mechagodzilla · on Aug 20, 2014

It's kind of like fretting that it's expensive to write software if you have to invent your own language and compiler first. It would be, which is why no one does that.

richardwhiuk · on Aug 20, 2014

Definitely agree about the quality of Hennessey and Patterson's Computer Architecture book.

I think the reason that the 32nm, 22nm and 14nm chips are identical from a consumer approach is because of two issues.

One is marketing - we no longer have convincing benchmarks that are as persuasive as the GHz a chip would support. It was a questionable metric anyway, but it now isn't being pursued (for a whole number of reasons).

The second is an application space problem - consumer applications are no longer exploiting CPU improvements in the same manner. That's partially because we now get more parallelism rather than single threaded speed, but it's partial a testament to the success of the previous generation as to the breadth of tasks a current PC will happily perform in a reasonable time frame.

darkmighty · on Aug 20, 2014

I think we're at a bridging moment. Most computing tasks we came up on the last decades which were very hard to get done are now trivial with some optimization. Yet the general purpose ("universal") algorithms we dreamed of are still not in reach -- it's easy to think of an evolutionary algo huge neural net to do AI, but in practice universality implies (still) inaccessible search spaces. So hrdware looks as if standing still.

pjmlp · on Aug 20, 2014

Yes, the time to update the computer every three years to keep up with the progress is now gone.

Maybe when we finally get to quantum computers a similar cycle will happen, but to the dismay of OEMs the cycle is gone.

Even mobiles already reached the same peak and are mostly driven by contract renewals nowadays, not features.

XorNot · on Aug 20, 2014

Disagree on the causes. We're limited by mobile energy storage.

Desktops are as powerful as anyone needs - largely because you just buy the appropriate size - at least at the consumer level.

Laptops, phones? Definitely not. But the push for more isn't being limited by our performance, it's being limited by energy demands. I'm shopping for a new laptop at the moment, and my overriding requirement is to get the most WHr's of battery into a low energy configuration.

And even there, to some extent the problem is also not where you'd think - you can take a modern i5 down to something like 1.5W idle consumption. Most of the losses are switching circuits or the screen.

pjmlp · on Aug 20, 2014

The issue, as you correctly state, is power consumption and there we still have quite some room for improvement.

Other than that, we also have same hybrid multi-core devices on our pockets that we have on our desks. Coupled with all the hardware sensors one can imagine.

So besides the power consumption point you rise, there is hardly anything else to improve at hardware level.

Except maybe, for things like better voice recognition experience or holographic displays a la Star Wars.

TeMPOraL · on Aug 20, 2014

> but to the dismay of OEMs the cycle is gone

Don't worry, they won't give up without a fight. I bet they'll try to push some artificial reasons for people to keep throwing away perfectly good machines every two years.

jodrellblank · on Aug 20, 2014

"Artificial" reasons instead of ... what? "Natural" reasons?

TeMPOraL · on Aug 20, 2014

Yes, more-less. Things people invent to extract more money out of their customers, or keep afloat a business model that is visibly incompatible with reality - like planned obsolescence on lightbulbs and fridges, or DRMs.

jarek · on Aug 20, 2014

Maybe they can put some really fragile components in a computer people carry around? Have them be really shiny and look great to help marketing too.

TeMPOraL · on Aug 20, 2014

And then maybe they could make it impossible to repair the broken part for less than a third of a new device. "Tight integration streamlines the production, makes everything cheaper", they could say.

ANTSANTS · on Aug 20, 2014

Like rewriting everything as a web app?

userbinator · on Aug 20, 2014

As someone who has watched the "RISC revolution" and the claims its proponents have been making since the 80s, I'm not so optimistic on what will happen. Initially it was all about increasing clock frequency to make up for increased instruction counts and decreased code density, but Intel's Netburst showed that this strategy was a dead end. Then the focus was on power efficiency, which is still a close battle today. I don't think it's a matter of budget; AMD has much less to spend than Intel, yet it makes x86 CPUs that are still quite competitive even if they're not as fast - the gap between them is nowhere near their differences in budget. It seems that things are more like "fast, cheap, power-efficient: pick two".

This related paper is an interesting read: http://research.cs.wisc.edu/vertical/papers/2013/isa-power-s...

I think x86 still has plenty of room for optimisation, and Intel are just doing it slowly according to market conditions. An open-source x86 core would be great - in fact, pre-MMX-level x86 is fully open, as all the patents on that have expired now.

pjmlp · on Aug 20, 2014

I also lived through it, but I would say that RISC won in the end, just not exactly as its proponents thought.

x86 are no longer real CISC, as they moved the internal architecture to RISC, starting with the Pentium 4.

And all the other processors that matter are also RISC.

The real loosers were the VLIW proponents.

taeric · on Aug 20, 2014

Sounds suspicious to claim victory while redefining terms. :)

I'm curious which processors that matter are all RISC. I mean, I realize there are a fair bit of chips out there. I wouldn't have thought many of them really fit the original RISC mold. Sure, they have less instructions than some of the old behemoths. But, that is a far cry from the aims of RISC.

There is also the rise in specialized processors. Combining that with the increased focus on consolidating everything into one chip, and much of the debate is just weird nowdays.

pjmlp · on Aug 20, 2014

Aren't ARM and the majority of micro-controlers RISC?

taeric · on Aug 20, 2014

I think that really depends on perspective. Calling micro-controllers RISC versus CISC sorta misses the point. They are of course reduced from general processors. Does that make them RISC, though?

Easy way to consider it, a GPU would probably not qualify as a RISC processor, but it also probably has fewer overall operations than a CISC processor. (Or, do they qualify nowdays?)

Now, ARM definitely is. It remains to be seen whether they can continue to be dominant once Intel enters the same arenas.

blt · on Aug 20, 2014

I've read arguments that CISC now has a speed advantage because its denser instruction encoding is more friendly on the cache. If true, then ARM's RISC instruction set would hold it back from ever competing with x86 on speed.

stephencanon · on Aug 20, 2014

x86 doesn’t really have a dense instruction encoding. The old un-prefixed scalar instructions are dense, but they do much less work per instruction than SSE or AVX instructions, which have two or three byte prefixes (four with AVX-512). arm64 instructions are all four bytes, for comparison.

The upshot is that simple scalar code tends to be somewhat more compact on x86, and tuned vector code (likely to be found in perf-critical routines that are CPU-bound) tends to be somewhat more compact on arm.

More to the point, loop buffers and µop caches on the last couple generations of processors make encoding density mostly irrelevant to performance (though occasional pathological examples do still exist).

_chris_ · on Aug 20, 2014

"More to the point, loop buffers and µop caches on the last couple generations of processors make encoding density mostly irrelevant to performance"

As in, the part where x86 processors get to fully enable the "operate as a RISC" processor mode.

phyllostachys · on Aug 20, 2014

Since ARM11 (ARMv6), cores have been running on Thumb 2 (16-bit instructions) and only jumping into regular 32-bit instructions when necessary. Before that there was the Thumb instruction set which according to [0] was introduced in 1994.

This makes ARM (especially ARMv7 and newer) a mixed setup like x86.

[0] - https://en.wikipedia.org/wiki/ARM_architecture#Thumb

asb · on Aug 20, 2014

Thumb2 only became standard with ARMv6. There are a few ARMv6 cores which implemented Thumb2 (as opposed to Thumb1), but they aren't in many commonly used products. The ARM1176 for instance as used in the Raspberry Pi is Thumb1 only.

phyllostachys · on Aug 20, 2014

Granted.

I suppose my point is that ARM is going towards the same inbetween-ism that x86 has moved to.

redraga · on Aug 20, 2014

From my experience certain applications, mainly server workloads, have a high instruction footprint. I-cache optimizations are necessary even on an x86 processor. So there might be some weight to this argument. I'd like to see the difference in behavior between the two architectures.

solarexplorer · on Aug 20, 2014

I think the point is not so much that RISC-V is better than x86 or that RISC-V with the same budget would be better than x86, etc. The point is that since RISC-V so simple, that you can produce _your_own_ customized cpus with a very small budget. Like adding special instructions, a special memory hierarchy, a special on-chip-network, whatever. It certainly made sense for digital signal processing, graphics and networking. It may make sense for other domains too.

These custom chips have the potential to be much more power efficient (for their special class of applications) than off-the-shelf general purpose chips.

With RISC-V it will be easier to start your own CPU design...

FullyFunctional · on Aug 31, 2014

That's largely it, but it's not just that it's simple; like Alpha, the RISC-V ISA has been carefully crafted to avoid choices that burdens implementations (be those simple or advanced).

A few examples of this (none of which are unique to RISC-V except in aggregation): no delayed-branch semantics, source and destination registers are always in fixed location (see then store vs. load encoding) and are always explicit, sign bits are immediate fields is always in the same location, etc.

rbanffy · on Aug 20, 2014

In order to be as efficient as a more naïve RISC architecture, x86 has to be very optimized with an Intel-sized budget behind it. For a RISC core to be as performant as a modern x86, it has to become much more complex (and it starts to resemble an x86 core, without the x86 decoder in front of it).

stephencanon · on Aug 20, 2014

> These days, Intel's 32nm, 22nm chips (and 14nm, from what is known about Broadwell) are more or less identical from a consumer's perspective.

That’s a little bit of a stretch. 32nm was Westmere -> Sandybridge, which doubled FP throughput and L1 load bandwidth, among other changes. Haswell doubled FP throughput again, doubled integer SIMD throughput, and doubled L1 load and store bandwidth, among several other significant changes, and there’s also been a steady ~10% per generation improvement in generic IPC. Consumers don’t care about those numbers, but they do care about some of the operations they enable getting faster. (In fairness, this often lags the release of the processors by a year or two as SW re-optimization is required to take advantage of new features; the full benefit of the changes isn’t available to consumers until some time after the processors are released, by which time we’ve all forgotten just how slow our old hardware was.)

The energy footprint of those processors has shrunken considerably in the same time, which is a huge improvement for portables.

valarauca1 · on Aug 20, 2014

>The energy footprint of those processors has shrunken considerably in the same time, which is a huge improvement for portables.

Yes and no. The Westermere, Sandybridge, Ivybridge, Haskell, Devil Cannon's line isn't really targeted for the mobile market. The shinking power budget is caused by 2 things.

1) To stay competitive against 64bit ARM processors. Intel processors are barely used in mobile devices. But eventually ARM will attempt to transition into laptop/desktop/server and compete with Intel directly, this is where lower power comes into play.

2) Feature set. New features (functionality) on chip cost not just dye space, but cost power and heat. The heat and power are the far most critical issues.

If your interested in a slightly satirical look at modern hardware scaling I suggest you read The Slow Winter http://research.microsoft.com/en-us/people/mickens/theslowwi...

valarauca1 · on Aug 20, 2014

>The energy footprint of those processors has shrunken considerably in the same time, which is a huge improvement for portables.

Yes and no. The Westermere, Sandybridge, Ivybridge, Haskell, Devil Cannon's line isn't really targeted for the mobile market. The shinking power budget is caused by 2 things.

1) To stay competitive against 64bit ARM processors. Intel processors are barely used in mobile devices. But eventually ARM will attempt to transition into laptop/desktop/server and compete with Intel directly, this is where lower power comes into play.

2) Feature set. New features (functionality) on chip cost not just dye space, but cost power and heat. The heat and power are the far most critical issues.

The second issue is commedically addressed in, "The Cold Winter" a satirical document about hardware engineering.

"The idea of creating more and more cores ran into an issue. Most people don't use their computer simulating 4 nuclear explosions while rendering Avatar in 1080p. They use their computer for precisely 10 things, of which 6 of things involve pornography.

The other issue was the 600 core dub 'Hydra of Destiny' was so smart that its design document was the best chess player in the facility. The issue was it required its own dedicated coal fired plant, and would run so hot that it would melt its way into the earth's core." - The Cold Winter (paraphased)

stephencanon · on Aug 20, 2014

"Portables" was intended to include laptops, at which the processors in question are absolutely targeted. Apologies if that was unclear.

Symmetry · on Aug 20, 2014

You'll find very few workloads where Haswells doubling of vector sizes actually doubles throughput. I mean, if you're working set is already in your level 1 data cache it will, since you won't be bandwidth constrained in that case and you won't have the wide vector unit on long enough that the chip will have to start power throttling. This is a situation that comes up sometimes, but you won't see your program accelerate by anything close to 100% overall.

stephencanon · on Aug 20, 2014

Except for any computation that can be tiled to use cache effectively so as to not be load-store bound, like FFTs, matrix multiplication, convolution, etc, which sound technical and not relevant to consumers, but actually underlie nearly everything that happens with images or sound on a CPU.

I write optimized compute libraries for a living. Haswell really was an enormous improvement for real workloads (actually, more than 2x for some integer image-processing tasks due to three-operand AVX2 instructions eliminating the need for more moves than the renamer could hide in some loops). Is everything magically 2x faster? No, of course not. Are a lot of important things significantly faster? Absolutely.

alricb · on Aug 20, 2014

Chicken, meet egg:

We did not include special instruction set support for over ow checks on integer arithmetic operations. Most popular programming languages do not support checks for integer overflow, partly because most architectures impose a signicant runtime penalty to check for overflow on integer arithmetic and partly because modulo arithmetic is sometimes the desired behavior. [1]

Please Regehr, don't hurt them. [2]

[1] Spec, section 2.4 https://s3-us-west-1.amazonaws.com/riscv.org/riscv-spec-v2.0...

[2] http://blog.regehr.org/archives/1154

tachyonbeam · on Aug 20, 2014

I happen to be implementing my own JavaScript JIT compiler and I can tell you that SpiderMonkey, V8 and my compiler all perform a lot of overflow checks. This is because in JavaScript, all numbers are doubles, but JITs typically try to represent them as integers where possible, because integer operations have lower latency.

Scheme and Common Lisp also rely on overflow checks to optimize small integers as "fixnums" instead of always using arbitrary precision arithmetic. Not having hardware support for overflow checks would complicate the implementation of many dynamic languages, and reduce their performance significantly.

Not sure what this guy was thinking. It can't be that hard to implement some overflow flag you can branch on, I mean, adders basically produces that information for free, don't they? Seems like a poor design choice.

ajb · on Aug 20, 2014

Hmm; carry is free, not sure about overflow. But more generally, it's an extra output, for what would otherwise be one-output operations. Put it this way - if a lot of javascript operations could modify a piece of global state, that would be trivial on a trivial interpreter, but I bet it would add a lot of complexity to your JIT. Same goes for CPUs - no problem for simple implementations, but it would add complexity to an out-of-order or other clever microarchitecture.

kps · on Aug 20, 2014

  > Hmm; carry is free, not sure about overflow.

Overflow is just the final carry out XOR the second-last carry bit, so it's practically free.

Of course RISC-V doesn't have a carry bit either!

hornetblack · on Aug 20, 2014

Mill architecture support it, by having multiple add instructions. add is wraparound add. addx returns a NaR^[1]. addw returns a double width value. As Mill stores integer size of registers these instructions apply to all integer sizes and vectors, without having separate instructions for each width (and vectors).

[1] NaR will propagate an error, but trigger an exception if it's stored or branched on.

renox · on Aug 21, 2014

Yes, in this day and age, making a new ISA worst than the MIPS ISA (which had TRAP on integer overflow) with regards to integer overflow behaviour is really annoying.

IMHO this is cost saving going too far..

I wish that the mill succeed instead, at least it brings new things regarding security!

asb · on Aug 20, 2014

If you'd like to be paid to work on a fully open-source RISC-V SoC, we're hiring. See http://lowrisc.org

PinguTS · on Aug 20, 2014

Please explain: What would be the benefit of an open source chip?

At the end of the day, the chip is a physical thing that have to be produced by someone. As long as there is no 3D printer, which can build nano structures to makes chips, I can only implement it into a FPGA. The main FPGA providers are either Altera or Xilinx and both a very expansive. So they are good for prototyping or very low volume. But for anything beyond, I need a specific implementation. Okay, I could go to a foundry like Samsung (or other) and order according to my design. But, even that requires a very high volume in the millions to make it affordable and viable from a business perspective. Especially, if it is intended for the IoT market. On the other side, I can buy ARM Cortex M0 and Cortex M1 for cheap from NXP and other for less than $1. They are power full and power consumption is very low.

Just to mention, the other day the WRTnode[1] was released for less den $25 or look at an Raspberry PI Compute. Don't get me wrong, Yes, I would like to see those with an open source chip. But would that be really a viable business case, except from the University context?

[1] http://wrtnode.com/

alricb · on Aug 20, 2014

On reason that's mentioned in the article: paranoia. If the chip is open-source, you can at least theoretically audit its manufacturing to make sure no backdoor is being inserted.

Also, return to the taxpayer. To me, a publicly-funded verification of RISC-V would bring wider benefits than that of a proprietary ISA like ARM [1].

http://www.cl.cam.ac.uk/~acjf3/arm/

Symmetry · on Aug 20, 2014

If you want to write a paper exploring some new computer architecture feature it's very nice to have a real chip with an actual existing toolchain to use as a starting point.

tormeh · on Aug 20, 2014

A startup can make a better chip that runs the same binaries without asking anyone for permission.

worklogin · on Aug 20, 2014

What's the point of an open source operating system? Most people buy their computing device with the OS installed, so the cost is marginal.

PinguTS · on Aug 20, 2014

You are not from the hardware business?

In hardware design you cannot build a "minimal viable product" and then do "continuous integration" from that on. A hardware product either works within its defined limits or it doesn't. Then, depending on your limitations, you need additional approval by FCC, FDA, and such organizations within the US alone. The same thing then with their counterparts in every different country. Chip design is then the next level up. Because changing a mask later on, because you found a bug, means you produced a million chips just for the dust bin.

That means the entry barrier into hardware, especially chips, is very high.

worklogin · on Aug 20, 2014

Reading your comment fully (something I should have done), I see you address the differences in the analogy. Sorry.

Perhaps the cost situation will change over time. Perhaps, in coming decades, the ability to fab microchips will itself become something achievable for thousands instead of millions of dollars. And as for regulation: That only matters if you're selling. If you're researching or making it for personal use, the FCC doesn't (or shouldn't) matter.

sp332 · on Aug 20, 2014

I can buy ARM Cortex M0 and Cortex M1 for cheap from NXP and other for less than $1

So imagine how cheap these new CPUs could be!

FigBug · on Aug 20, 2014

Arm licensing fees are about 1% - 2%, so these new chips would be less than $0.98?

igravious · on Aug 20, 2014

Talking about the RISC-V Instruction Set Architecture

http://riscv.org/

Why would you need 128bit addressing? Isn't 64bit address space plenty big? This isn't a "nobody'll need more than 640k scenario, right?"

https://gigaom2.files.wordpress.com/2014/08/requirements-tab...

unwind · on Aug 20, 2014

There's motivation in the relevant part (RV128I Base Integer Instruction Set), starting on page 81 of the 2.0 ISA specification (https://s3-us-west-1.amazonaws.com/riscv.org/riscv-spec-v2.0...).

A choice quote:

It is not clear when a flat address space larger than 64 bits will be required. At the time of writing, the fastest supercomputer in the world as measured by the Top500 benchmark had over 1 PB of DRAM, and would require over 50 bits of address space if all the DRAM resided in a single address space. Some warehouse-scale computers already contain even larger quantities of DRAM, and new dense solid-state non-volatile memories and fast interconnect technologies might drive a demand for even larger memory spaces. Exascale systems research is targeting 100 PB memory systems, which occupy 57 bits of address space. At historic rates of growth, it is possible that greater than 64 bits of address space might be required before 2030.

userbinator · on Aug 20, 2014

if all the DRAM resided in a single address space

Key point, and that's a huge "if". All large systems are NUMA, and trying to treat that like a uniform address space will be absolutely horrible because of the extreme latencies that arise.

rch · on Aug 20, 2014

You're correct of course, but I wanted to point out that people shouldn't dismiss everything >64 bit out of hand based on this reasoning. There are architectures that can make use of those extra bits, and not just for addressing a universe of RAM. The Transmeta processors weren't perfect or anything, but there's merit in the VLIW approach. The first generation Crusoe chip was a 128 bit part, and the second was 256 bit; this was a decade ago, in chips designed for ultra-light consumer laptops. As I understand it, some key people from Transmeta ended up at P.A. Semi, which of course was acquired by Apple in 2008. I wouldn't be at all surprised if we're talking about VLIW architectures again by 2016 or at least 2020.

- https://en.wikipedia.org/wiki/Transmeta_Efficeon

- https://en.wikipedia.org/wiki/Very_long_instruction_word

aidenn0 · on Aug 20, 2014

1) Instruction size is irrelevant to address size

2) Yes Transmeta was VLIW internally, but I see that as an implementation-detail over other forms of superscalar; either way you have a linear stream of instructions generated by the compiler, with hardware turning that into parallel execution by the CPU at runtime. Calling that "VLIW" is about as interesting as calling a modern x86 "RISC."

aidenn0 · on Aug 20, 2014

I heard a rumor that the technical people from PASemi left after the acquisition. Does anyone have a reliable source to substantiate (or refute) that?

fulafel · on Aug 20, 2014

Most numa machines are single address space, and it's a very handy thing to have. You can run general pupouse code on the machine for things that are not perf critical but need to get done.

chrisdew · on Aug 20, 2014

Clutching at straws here, but...

One use case for 128 bit addressing is for a single-address-space OS running on a cluster. http://en.wikipedia.org/wiki/Single_address_space_operating_...

Pointers can have 32 bits for the IP address of the hosting node and 64 bits for the address within the node, and 32 bits spare for various flags.

krylon · on Aug 20, 2014

I have no idea if this is technically feasible, but it sounds like a sweet idea!

valarauca1 · on Aug 20, 2014

With advent of memristor based memory it'll be possible today (at extremely high cost on the order of a medium nation's GDP) have on the orders of petabytes of what will functionally be RAM. Only ~6-7 tick/tock cycles are necessary for that to expand beyond the address space of 64bit addressing (~10^19).

igravious · on Aug 20, 2014

How long is a tick/tock cycle? A few years? 3 * 7 = 21. So we'd need to prep for going beyond 64bit some time before 2035? Which chimes with the quote At historic rates of growth, it is possible that greater than 64 bits of address space might be required before 2030 that 'unwind' kindly gives in a sister comment.

Seems like good forward planning but a little premature, no? These features must come at some cost. Surely you could have a re-vamp sometime around 2025?

vidarh · on Aug 20, 2014

Why in the world would you plan a new architecture today, that will take years - easily a decade or more - to gain enough traction to be interesting, only to have to re-vamp it shortly afterwards?

We have off the shelf x86 machines today that can fit enough memory to use about 43 bits to address it (at least in theory - depending on DIMM availability; 1TB/2TB is trivially available on the other hand). Now add in size of readily viable storage arrays or even network file systems and expect some people to want to be able to memory map "unreasonably" large files, and suddenly we're into 50+ bits today.

It'd seem crazy to me if someone were to start planning a new processor architecture today without planning for addresses greater than 64 bits.

JoeAltmaier · on Aug 20, 2014

64 bits is a hell of a lot larger than 48 bits. You could make the argument that humanity will never need more than 64 bits, no matter how many trillions of years our race exists and uses computers.

audunw · on Aug 20, 2014

You dont necessarily have to use every bit of that address space, for a large address space to be useful.

The Mill architecture for instance, requires each program to live in its own area of the address space, as this helps make the caches faster (TLB dont have to come before cache). This still leaves a comfortable amount of space left in the 64bit address space.. today. But maybe not in the future

JoeAltmaier · on Aug 20, 2014

Still unlikely there will be enough apps to matter? Every single bit in an address is a doubling of available space, remember.

vidarh · on Aug 20, 2014

The number of apps needed for it to matter can be measured in the single digits for there to be demand to pay for it, assuming it's the right single digits...

It's quite possible - even likely - that low end systems will not be 128 bits anytime soon. 8 bit microprocessors are still selling in large volumes for embedded use. But we're about to see 64 bit entering phones this year, because it is becoming necessary, or at least more convenient than not.

All the evidence is that we're not heading towards a slowdown in growth in storage requirements anytime soon. If anything, the existence of super-computers that are spread over huge clusters instead of being a single, tightly integrated system indicates that there is some level of demand for systems several orders of magnitude larger than the current largest off the shelf systems even today, at the right price.

The top end of the market have increased by at least 10 bits over the last 11-12 years alone. At the current storage growth rates, we'll hit the 64 bit limit on single server systems sometime in the next 10-20 years when factoring in memory mapped IO; sooner for single-system-image clusters.

JoeAltmaier · on Aug 20, 2014

I don't know. That address space is very much larger than all the storage every produced in human history.

vidarh · on Aug 21, 2014

> I don't know. That address space is very much larger than all the storage every produced in human history.

Not a chance. 64 bit is ca 16 exabytes.

Currently, the shipping volume of harddrives is about 500 million units/year. If we're generous and say that their average storage size is only 100GB, despite the large number of models in the 1TB-5TB segment, then that's 50 million TB/year, or about 50 exabytes of harddrive capacity per year. In reality it's likely much higher, and rising rapidly.

Yes, the number sounds big, but so did 1TB just a few years ago. And 1GB just a few years before that. It's not that long ago we were marvelling over even being able to buy 20MB hd's for home use. The number may sound outrageous, but my experience based on actual product availability is that we should expect a factor of 1000+ rises in storage capacity per 10-15 years, and I see no evidence to justify a slowdown.

And increases in capability causes changes in how we engineer things. When petabyte sized databases becomes possible for more people at reasonable price points, you'll see a lot of people that previously "made do" with terabyte sized databases find all kinds of uses for extra analysis etc., or simply storing more intermedia stages and being more wasteful because we can.

JoeAltmaier · on Aug 21, 2014

AH - counting disk space. Yes, if you mapped all that into one computer's address space you can make the math work out.

vidarh · on Aug 21, 2014

I've made the point several times that people do in fact expect to be able to mmap() files far bigger than physical memory on larger systems.

Purely for RAM we can survive with 64 bit for maybe a decade extra.

trsohmers · on Aug 20, 2014

That's not true at all... A 64 bit system can address up to 16 exabytes of data... Google's datacenters (in 2013) estimated data storage was 15 exabytes.

TheLoneWolfling · on Aug 20, 2014

And you could say the same about 32 bits when the NORD-5 was introduced.

vidarh · on Aug 20, 2014

We are looking at 50+ bits today if you want to mmap the largest off the shelf files you'll be able to store on local disk on some of the servers available even from the average system integrator. 47-48bits of physical RAM for shared-memory setups from companies like SGI, possibly more.

And it's not that big a leap: 15 years ago, I worked mostly on systems with <1GB RAM, and where to get a decent performance 1TB+ disk array, we were looking at a fridge sized monstrosity. Today, my laptop has more than that. Even my phone is closing in: 1GB RAM + 64GB storage (even some mid range Chinese phones today advertise support for 256GB SD cards, so it is already obsolete)

15 years before that again - 1985 - my machine had 64KB of RAM and 176KB floppies. A couple of years later I finally got 1MB of RAM and a 20MB HD.

So I don't consider it unreasonable to assume that we'll have systems in the 1PB RAM range and 1000+ PB storage range by 2030. That's about 50 bits for physical memory alone, or more like 60 bits to memory map that much secondary storage.

That's assuming no shared memory eating address space for example.

For most people, it will not be that relevant for a few more years, just like it took several years from 32bit became an issue on servers until it mattered for home computers, and like how 32 bit is first now becoming an issue for mobile.

But think about that for a second: We don't need 64 bit for our mobile phones. We do nothing on them we couldn't find ways to shoehorn into a 32bit address space for decades to come. But going to 64bit is convenient. It lets us shove 8GB or 16GB RAM or more into future systems and not have to think so much about memory efficiency. So we're going there.

Thus the idea that "humanity will never need more than 64 bits" I think is pretty much ridiculous, because it puts things on its head: It's not that we will "need" it, but that we will easily find ways of making use of it if we can. E.g. direct addressing all your storage is convenient in all kinds of ways. Storing all your home videos uncompressed, unedited, in 8K resolutions, in 3D, at a higher frame rate, becomes convenient when there's enough storage. Being able to write apps that can mmap multi PB monstrosity video projects instead of worrying about disk buffering becomes convenient when it's possible to do it.

It's not that long since I used to consider databases of a few hundred MB big. Now I throw around multi GB databases on a daily basis, and I know they are small for a lot of people, who deal with individual databases in the TB or PB size without blinking.

valarauca1 · on Aug 20, 2014

64ints you mean a 64bit signed dated is massive. Not 64bit address space, 64bits is only ~18exabytes of information. I'm very sure we'll find a need to load that much data into ram eventually.

JoeAltmaier · on Aug 20, 2014

No I meant 64-bit pointers are large enough to handle every piece of memory every manufactured in human history.

TheLoneWolfling · on Aug 20, 2014

And at the time of the first 32-bit processors you could potentially say the same about 32-bit pointers.

sp332 · on Aug 20, 2014

IPv6-based addressing? :)

Narishma · on Aug 20, 2014

From the article: "Popular chip architectures historically have been locked down behind strict licensing rules by companies such as Intel, ARM and IBM (although IBM has opened this up a bit for industry partners with its OpenPower foundation)."

IBM's new openness isn't really open at all. It's just what ARM has always been doing: they allow you to pay them a lot of money so you can use their ISA in your CPU.

krmboya · on Aug 20, 2014

Maybe a bit OT, but I've just started learning programming for MMIX, Donald Knuth's RISC computer. I've been wondering when, or if, it could one day be implemented in hardware.

Zigurd · on Aug 20, 2014

This paper http://www.eecs.berkeley.edu/Pubs/TechRpts/2014/EECS-2014-14... pointed to in the article posted here, claims higher performance per Mhz and lower power consumption than ARM in table 2.

Still, that requires some chip maker to build a SoC around a RISC-V CPU that attains these efficiencies in the real world.

The paper makes these arguments for RISC-V:

• Greater innovation via free-market competition from many more designers, including open vs. proprietary implementations of the ISA.

• Shared open core designs, which would mean shorter time to market, lower cost from reuse, fewer errors given many more eyeballs3 , and transparency that would make it hard, for example, for government agencies to add secret trap doors.

• Processors becoming affordable for more devices, which helps expand the Internet of Things (IoTs), which could cost as little as $1

The first point is not very concrete. China has long had some of their own MIPS-based RISC CPU designs, and they are most likely to act on the transparency issue. That leaves super-cheap processors for IoT. ARM may be able to deliver pricing and value that's better than free.

And all this assumes very low friction in the form of, say, Android adding this ISA to the standard set of compilation targets for native code, and, to ART pre-compilation.

ChuckMcM · on Aug 20, 2014

Perhaps the most important contribution Linux has made to the world is that because of its impact, you can run an OS on nearly any instruction set architecture (ISA). That drops a pretty huge barrier in terms of getting to something from nothing. On RISC-V though, unless it was some sort of SoC (which most people won't build) I don't see the impact. But carrying that line a bit further ...

One of the challenges that Microchip has faced was that the affinity for C that the ATMega architecture had from rival Atmel meant it lost a few significant design wins (Arduino perhaps the most serious). They could use RISC-V to try to offset the Atmel SAM series. But other than that I don't see the motivation for folks to not use ARM, granted a full processor license would be expensive but if you are looking at volumes where that would be an advantage, it isn't that expensive.

TheMagicHorsey · on Aug 20, 2014

Is licensing an ARM core really a barrier in a business/community venture, once you take into account the enormous costs of just getting a fab to make your chip?

I feel like the fabbing cost is so high, that at that scale, the CPU license fee is really nothing.

Correct me if I am wrong. This is coming from the mind of someone who knows next to nothing about how hardware is really made.

aidenn0 · on Aug 20, 2014

I think the bigger consideration is that the ARM IP is encumbered. You can't tweak it and publish the changes you made.

ksec · on Aug 20, 2014

Exactly my thoughts. Licensing the ARMv8 is cheap, getting it to fab is expensive. And the current trends goes more advance nodes will be more expensive. Hence the move to smaller node will no longer be as much about getting cheaper. Until 450mm wafer comes out, which still does not have a proper schedule.

comatose_kid · on Aug 20, 2014

David gave a nice lecture on RAID over a decade ago when I was taking a computer architecture class at Stanford.

Perhaps the next edition of his textbook (bible for computer architecture) should use RISC-V, it would probably help as a learning aid and spread the gospel about RISC-V.

Of course, it's possible that the current edition of the text already does this.

niix · on Aug 20, 2014

Every time I hear "RISC Architecture" I think of the movie Hackers.