Assuming Moore's Law is slowing down, and there's no rabbit about to be pulled o...

DCKing · on Aug 20, 2014

Well there's two predictions you can think about that could happen sometime in the future:

1. Nobody can put more than XXX transistors on a square millimeter (where XXX is some physical limit to chip material)

2. "Everybody" can put YYY transistors on a square millimeter (where YYY is a number sufficiently close to XXX so that the difference does not matter much)

The first means some stagnation in the chip market: not only does Moore's Law not apply anymore, but not even does a weak version of Moore's Law ("chips tend to get faster/cheaper over time") apply. But if that scenario becomes true, it doesn't necessarily mean that the democratization of the second scenario becomes true. The ability to put YYY transistors on a square millimeter could still be limited to a select few companies investing billions in the production of these chips. After all, how many fabs worldwide are currently capable of producing 45nm chips, which could be considered "commodity" [1]?

Is it really true that it becomes much cheaper to produce chips on smaller process nodes over time? Sure it becomes cheaper since less original R&D is necessary, but is it reasonable to think that producing 20nm chips ever becomes a question of investing anything less than, say, 100s of millions? Does the second scenario ever become true, at least in the forseeable future?

I don't know much about chip fabs so I wonder if someone more knowledgeable could share some insight on this.

[1]: I genuinely don't know the answer to this, but I don't suppose it's very widespread?

cben · on Aug 20, 2014

If 2 keeps not happenning — if producing custom ASICs remains expensive — it becomes increasingly attractive to produce cheap as-good-as-they-get FPGAs in massive quantities.

Ironically, FPGAs as endgame would be way more disruptive than an "ASICs to the people" endgame.

FPGA density, price/gate and power efficiency trail "native" hardware but constant factors of overhead but it should be possible to reduce the gap compared to now. More FPGA volume means better economies; the hardware architecture could be optimized further; most importantly IMHO the current design tools are heinously unfriendly and could benefit a lot from programmer attention (once programmers realize a more-or-finalized FPGA structure is the "new assembly" to optimize for).

I'm not sure if a constant factor of disadvantage would become very acceptable (because we'll drop the throw-faster-hardware-at-it mentality) or very unacceptable (because robots with FPGA brains always lose at high-frequency chess wrestling to robots with native brains).

tachyonbeam · on Aug 20, 2014

Would ideally need to have some kind of open source FPGA catch on (and be manifactured with competitive performance). Companies like Xilinx and Altera are not at all keen on open sourcing their technology and tools. IMO, through their actions, they're shooting themselves in the foot, keeping FPGAs from becoming what GPGPU is right now (or something even better, in fact).

Symmetry · on Aug 20, 2014

If I were to bet on which fabs were able to get to the final process node for traditional silicon transistors, my money would probably be on Intel and TSMC, since they have the most volume on advanced processes and spend the most money on new fabs and process research. And TSMC is the biggest commodity fab.

The reason it becomes cheaper is that your development costs for both the process and the design get amortized over longer and longer periods. At the same time architecture becomes the only way that some chips can be faster than others.

Though I guess I should point out that progress in chip performance isn't the same as putting more transistors on a square millimeter. One aspect of this is that more economical silicon processes such as TSMC's are better at cramming more transistors into a square millimeter than Intel's, but Intel's transistors tend to have better drive current - all at a given process node. The other aspect is that ever increasing transistor leakage means that processors might get to an era when they can't afford to keep all their transistors lit up at the same time[1].

[1]http://hopefullyintersting.blogspot.com/2012/02/dark-silicon...

aidenn0 · on Aug 20, 2014

Don't count IBM out either; they have certain chips with such high margins that they can stay fairly leading edge even with smaller volumes.

mechagodzilla · on Aug 20, 2014

These days, Intel is an unusual model in that they own the whole 'stack', from processor design to fabrication, packaging, etc. To replicate that costs 10s of billions of dollars.

To get your own chip produced using a 'pure-play' fab (which, these days, basically means TSMC, and to a lesser extent Samsung and GloFo), however, is much cheaper. If you want a 20nm chip, you design your chip (basically the cost is just engineering effort + CAD tools..a few million $$), send it to TSMC (getting masks made is another few million $$), and they send you back chips.

Reality is slightly more complicated, in that you're using using an integrator to manage the relationship with TSMC, the packaging company, the testing company, etc., but a startup with $10-20 million could probably scrape together a decent 20nm chip.

Once TSMC has built a new fab though, the longer it is 'relevant' the cheaper they can make the wafers, as they have more wafers to amortize the ~$10 billion or whatever it cost to build the fab over.

DCKing · on Aug 20, 2014

It's a confusing discussion (for me). It is true that "chip design" is somewhat democratized in that you don't need 100s of millions to design a chip and have it made on a competitive process. The thing I pondered was whether it will ever cost less than 100s of millions to make it yourself on a competitive process.

Although I guess the latter really isn't that relevant when it comes to the discussion of chip design.

mechagodzilla · on Aug 20, 2014

It's kind of like fretting that it's expensive to write software if you have to invent your own language and compiler first. It would be, which is why no one does that.

richardwhiuk · on Aug 20, 2014

Definitely agree about the quality of Hennessey and Patterson's Computer Architecture book.

I think the reason that the 32nm, 22nm and 14nm chips are identical from a consumer approach is because of two issues.

One is marketing - we no longer have convincing benchmarks that are as persuasive as the GHz a chip would support. It was a questionable metric anyway, but it now isn't being pursued (for a whole number of reasons).

The second is an application space problem - consumer applications are no longer exploiting CPU improvements in the same manner. That's partially because we now get more parallelism rather than single threaded speed, but it's partial a testament to the success of the previous generation as to the breadth of tasks a current PC will happily perform in a reasonable time frame.

darkmighty · on Aug 20, 2014

I think we're at a bridging moment. Most computing tasks we came up on the last decades which were very hard to get done are now trivial with some optimization. Yet the general purpose ("universal") algorithms we dreamed of are still not in reach -- it's easy to think of an evolutionary algo huge neural net to do AI, but in practice universality implies (still) inaccessible search spaces. So hrdware looks as if standing still.

pjmlp · on Aug 20, 2014

Yes, the time to update the computer every three years to keep up with the progress is now gone.

Maybe when we finally get to quantum computers a similar cycle will happen, but to the dismay of OEMs the cycle is gone.

Even mobiles already reached the same peak and are mostly driven by contract renewals nowadays, not features.

XorNot · on Aug 20, 2014

Disagree on the causes. We're limited by mobile energy storage.

Desktops are as powerful as anyone needs - largely because you just buy the appropriate size - at least at the consumer level.

Laptops, phones? Definitely not. But the push for more isn't being limited by our performance, it's being limited by energy demands. I'm shopping for a new laptop at the moment, and my overriding requirement is to get the most WHr's of battery into a low energy configuration.

And even there, to some extent the problem is also not where you'd think - you can take a modern i5 down to something like 1.5W idle consumption. Most of the losses are switching circuits or the screen.

pjmlp · on Aug 20, 2014

The issue, as you correctly state, is power consumption and there we still have quite some room for improvement.

Other than that, we also have same hybrid multi-core devices on our pockets that we have on our desks. Coupled with all the hardware sensors one can imagine.

So besides the power consumption point you rise, there is hardly anything else to improve at hardware level.

Except maybe, for things like better voice recognition experience or holographic displays a la Star Wars.

TeMPOraL · on Aug 20, 2014

> but to the dismay of OEMs the cycle is gone

Don't worry, they won't give up without a fight. I bet they'll try to push some artificial reasons for people to keep throwing away perfectly good machines every two years.

jodrellblank · on Aug 20, 2014

"Artificial" reasons instead of ... what? "Natural" reasons?

TeMPOraL · on Aug 20, 2014

Yes, more-less. Things people invent to extract more money out of their customers, or keep afloat a business model that is visibly incompatible with reality - like planned obsolescence on lightbulbs and fridges, or DRMs.

jarek · on Aug 20, 2014

Maybe they can put some really fragile components in a computer people carry around? Have them be really shiny and look great to help marketing too.

TeMPOraL · on Aug 20, 2014

And then maybe they could make it impossible to repair the broken part for less than a third of a new device. "Tight integration streamlines the production, makes everything cheaper", they could say.

ANTSANTS · on Aug 20, 2014

Like rewriting everything as a web app?

userbinator · on Aug 20, 2014

As someone who has watched the "RISC revolution" and the claims its proponents have been making since the 80s, I'm not so optimistic on what will happen. Initially it was all about increasing clock frequency to make up for increased instruction counts and decreased code density, but Intel's Netburst showed that this strategy was a dead end. Then the focus was on power efficiency, which is still a close battle today. I don't think it's a matter of budget; AMD has much less to spend than Intel, yet it makes x86 CPUs that are still quite competitive even if they're not as fast - the gap between them is nowhere near their differences in budget. It seems that things are more like "fast, cheap, power-efficient: pick two".

This related paper is an interesting read: http://research.cs.wisc.edu/vertical/papers/2013/isa-power-s...

I think x86 still has plenty of room for optimisation, and Intel are just doing it slowly according to market conditions. An open-source x86 core would be great - in fact, pre-MMX-level x86 is fully open, as all the patents on that have expired now.

pjmlp · on Aug 20, 2014

I also lived through it, but I would say that RISC won in the end, just not exactly as its proponents thought.

x86 are no longer real CISC, as they moved the internal architecture to RISC, starting with the Pentium 4.

And all the other processors that matter are also RISC.

The real loosers were the VLIW proponents.

taeric · on Aug 20, 2014

Sounds suspicious to claim victory while redefining terms. :)

I'm curious which processors that matter are all RISC. I mean, I realize there are a fair bit of chips out there. I wouldn't have thought many of them really fit the original RISC mold. Sure, they have less instructions than some of the old behemoths. But, that is a far cry from the aims of RISC.

There is also the rise in specialized processors. Combining that with the increased focus on consolidating everything into one chip, and much of the debate is just weird nowdays.

pjmlp · on Aug 20, 2014

Aren't ARM and the majority of micro-controlers RISC?

taeric · on Aug 20, 2014

I think that really depends on perspective. Calling micro-controllers RISC versus CISC sorta misses the point. They are of course reduced from general processors. Does that make them RISC, though?

Easy way to consider it, a GPU would probably not qualify as a RISC processor, but it also probably has fewer overall operations than a CISC processor. (Or, do they qualify nowdays?)

Now, ARM definitely is. It remains to be seen whether they can continue to be dominant once Intel enters the same arenas.

blt · on Aug 20, 2014

I've read arguments that CISC now has a speed advantage because its denser instruction encoding is more friendly on the cache. If true, then ARM's RISC instruction set would hold it back from ever competing with x86 on speed.

stephencanon · on Aug 20, 2014

x86 doesn’t really have a dense instruction encoding. The old un-prefixed scalar instructions are dense, but they do much less work per instruction than SSE or AVX instructions, which have two or three byte prefixes (four with AVX-512). arm64 instructions are all four bytes, for comparison.

The upshot is that simple scalar code tends to be somewhat more compact on x86, and tuned vector code (likely to be found in perf-critical routines that are CPU-bound) tends to be somewhat more compact on arm.

More to the point, loop buffers and µop caches on the last couple generations of processors make encoding density mostly irrelevant to performance (though occasional pathological examples do still exist).

_chris_ · on Aug 20, 2014

"More to the point, loop buffers and µop caches on the last couple generations of processors make encoding density mostly irrelevant to performance"

As in, the part where x86 processors get to fully enable the "operate as a RISC" processor mode.

phyllostachys · on Aug 20, 2014

Since ARM11 (ARMv6), cores have been running on Thumb 2 (16-bit instructions) and only jumping into regular 32-bit instructions when necessary. Before that there was the Thumb instruction set which according to [0] was introduced in 1994.

This makes ARM (especially ARMv7 and newer) a mixed setup like x86.

[0] - https://en.wikipedia.org/wiki/ARM_architecture#Thumb

asb · on Aug 20, 2014

Thumb2 only became standard with ARMv6. There are a few ARMv6 cores which implemented Thumb2 (as opposed to Thumb1), but they aren't in many commonly used products. The ARM1176 for instance as used in the Raspberry Pi is Thumb1 only.

phyllostachys · on Aug 20, 2014

Granted.

I suppose my point is that ARM is going towards the same inbetween-ism that x86 has moved to.

redraga · on Aug 20, 2014

From my experience certain applications, mainly server workloads, have a high instruction footprint. I-cache optimizations are necessary even on an x86 processor. So there might be some weight to this argument. I'd like to see the difference in behavior between the two architectures.

solarexplorer · on Aug 20, 2014

I think the point is not so much that RISC-V is better than x86 or that RISC-V with the same budget would be better than x86, etc. The point is that since RISC-V so simple, that you can produce _your_own_ customized cpus with a very small budget. Like adding special instructions, a special memory hierarchy, a special on-chip-network, whatever. It certainly made sense for digital signal processing, graphics and networking. It may make sense for other domains too.

These custom chips have the potential to be much more power efficient (for their special class of applications) than off-the-shelf general purpose chips.

With RISC-V it will be easier to start your own CPU design...

FullyFunctional · on Aug 31, 2014

That's largely it, but it's not just that it's simple; like Alpha, the RISC-V ISA has been carefully crafted to avoid choices that burdens implementations (be those simple or advanced).

A few examples of this (none of which are unique to RISC-V except in aggregation): no delayed-branch semantics, source and destination registers are always in fixed location (see then store vs. load encoding) and are always explicit, sign bits are immediate fields is always in the same location, etc.

rbanffy · on Aug 20, 2014

In order to be as efficient as a more naïve RISC architecture, x86 has to be very optimized with an Intel-sized budget behind it. For a RISC core to be as performant as a modern x86, it has to become much more complex (and it starts to resemble an x86 core, without the x86 decoder in front of it).

stephencanon · on Aug 20, 2014

> These days, Intel's 32nm, 22nm chips (and 14nm, from what is known about Broadwell) are more or less identical from a consumer's perspective.

That’s a little bit of a stretch. 32nm was Westmere -> Sandybridge, which doubled FP throughput and L1 load bandwidth, among other changes. Haswell doubled FP throughput again, doubled integer SIMD throughput, and doubled L1 load and store bandwidth, among several other significant changes, and there’s also been a steady ~10% per generation improvement in generic IPC. Consumers don’t care about those numbers, but they do care about some of the operations they enable getting faster. (In fairness, this often lags the release of the processors by a year or two as SW re-optimization is required to take advantage of new features; the full benefit of the changes isn’t available to consumers until some time after the processors are released, by which time we’ve all forgotten just how slow our old hardware was.)

The energy footprint of those processors has shrunken considerably in the same time, which is a huge improvement for portables.

valarauca1 · on Aug 20, 2014

>The energy footprint of those processors has shrunken considerably in the same time, which is a huge improvement for portables.

Yes and no. The Westermere, Sandybridge, Ivybridge, Haskell, Devil Cannon's line isn't really targeted for the mobile market. The shinking power budget is caused by 2 things.

1) To stay competitive against 64bit ARM processors. Intel processors are barely used in mobile devices. But eventually ARM will attempt to transition into laptop/desktop/server and compete with Intel directly, this is where lower power comes into play.

2) Feature set. New features (functionality) on chip cost not just dye space, but cost power and heat. The heat and power are the far most critical issues.

If your interested in a slightly satirical look at modern hardware scaling I suggest you read The Slow Winter http://research.microsoft.com/en-us/people/mickens/theslowwi...

valarauca1 · on Aug 20, 2014

>The energy footprint of those processors has shrunken considerably in the same time, which is a huge improvement for portables.

Yes and no. The Westermere, Sandybridge, Ivybridge, Haskell, Devil Cannon's line isn't really targeted for the mobile market. The shinking power budget is caused by 2 things.

1) To stay competitive against 64bit ARM processors. Intel processors are barely used in mobile devices. But eventually ARM will attempt to transition into laptop/desktop/server and compete with Intel directly, this is where lower power comes into play.

2) Feature set. New features (functionality) on chip cost not just dye space, but cost power and heat. The heat and power are the far most critical issues.

The second issue is commedically addressed in, "The Cold Winter" a satirical document about hardware engineering.

"The idea of creating more and more cores ran into an issue. Most people don't use their computer simulating 4 nuclear explosions while rendering Avatar in 1080p. They use their computer for precisely 10 things, of which 6 of things involve pornography.

The other issue was the 600 core dub 'Hydra of Destiny' was so smart that its design document was the best chess player in the facility. The issue was it required its own dedicated coal fired plant, and would run so hot that it would melt its way into the earth's core." - The Cold Winter (paraphased)

stephencanon · on Aug 20, 2014

"Portables" was intended to include laptops, at which the processors in question are absolutely targeted. Apologies if that was unclear.

Symmetry · on Aug 20, 2014

You'll find very few workloads where Haswells doubling of vector sizes actually doubles throughput. I mean, if you're working set is already in your level 1 data cache it will, since you won't be bandwidth constrained in that case and you won't have the wide vector unit on long enough that the chip will have to start power throttling. This is a situation that comes up sometimes, but you won't see your program accelerate by anything close to 100% overall.

stephencanon · on Aug 20, 2014

Except for any computation that can be tiled to use cache effectively so as to not be load-store bound, like FFTs, matrix multiplication, convolution, etc, which sound technical and not relevant to consumers, but actually underlie nearly everything that happens with images or sound on a CPU.

I write optimized compute libraries for a living. Haswell really was an enormous improvement for real workloads (actually, more than 2x for some integer image-processing tasks due to three-operand AVX2 instructions eliminating the need for more moves than the renamer could hide in some loops). Is everything magically 2x faster? No, of course not. Are a lot of important things significantly faster? Absolutely.