I like that list but am sorry that they didn't include the 7400. A far more influential part than Transmeta (which I sy despite having many friends who were at Transmeta).
Would the Atmel AVR line (such as the ATMega328P or the ATTiny85) count as one set microchips that shook the world. They played an important role in the rise of modern hobby electronics such as the Arduino, but I don't know how much they are used in practical applications.
After reading the article the chips they chose shook the world because of their silicon, not good tool support like AVR studio and Arduino, which was what made the Atmegas so popular. The 328 in particular is otherwise a boring and low powered chip micro, despite its impact on the hobby world.
Oh I understand now; the 328 is just a simple microcontroller which isn't really technologically progressing, still a pretty useful platform for prototyping some system in an afternoon before ramping up the deisgn and manufacturing.
AVR line got a lot of traction through their aggressive push via educational institutions, much like ST has been doing with their STM32 ARMs or Microsoft has been doing with its products.
Wide network of peers with hands-on experience, easy access to tools (cheap devboards) can be a huge factor for hobbyists.
As stated with the 68000 - I often wondered where the industry would be if the 68000 was picked instead of the x86 architecture by IBM. The 68000 had a far better design and the ISA had real legs. Instead we got stuck with the monstrosity that is the x86 architecture. Probably put us back a decade or so I think.
While I can't answer that, here is a recent submission that goes into why the 68000 wasn't chosen (tldr: they were just too late for the rigorous and therefore time-consuming quality assurance program that IBM had imposed on the IBM PC): https://news.ycombinator.com/item?id=14619360
The 68000 did well and was popular (in Macs, Amigas, NeXT / Sun boxes, etc) as long as ISA mattered. When it faded away, x86 was C-based and 32-bit which mitigated its weaknesses.
Yes, MS software was terrible, but this had little to do with x86 after the 32 bit transition.
I'm happy to see the XC2018 FPGA on this list. I remember when they came out, but at the time I did not realize just how much could be done with simple logic (I was then interested in bit-slice and microcode), so ignored them. I did get to use them though since both the 2018 and 2064 were still available in the mid-90s, even though they were obsolete.
This list is missing RF chips, though I suppose success is from RF technology on chips, not any single killer chip.
I hadn't seen that the posted article was from 2009 until I read the follow-up, which talked about how amazing the Cell Processor was. Good thing that chip didn't make the cut - just eight years after the article was written, it's pretty obvious the Cell turned out to not be so special or influential.
I wish IEEE had a newspaper! A really great, comprehensive list. Really good that they have added also the CCD chips and the DLP micro mirrors as well!
What I mean is that I wish THEY were the ones who ran one major newspaper and covered daily news with the same depth and subject-matter insight that they did in this article. A great article.
This is a really excellent list. It includes a lot of things that are easy to forget due to how much we've progressed since then but were absolutely hugely impactful in their day. What I find fascinating is the degree to which we've come to pretty much master this stuff. A modern smartphone includes analogs of nearly every single chip in the list or depends on some core element of the functionality in some way.
- National 32032 CPUs. Basically a VAX-on-a-chip, they were relatively slow and pretty buggy. (They were one of the processors we considered using in the Atari ST, but the 68000 won out, thank goodness).
- INMOS Transputer. One of the early "just apply +5V and ground and happy computing!" systems with zero support chips needed, quite easy to put into a grid of compute elements. Very interesting nibble-based instruction set, worth studying. Unfortunately the high-level language INMOS was pushing (Occam) was really strange, and Transputer performance was never really great. There were also some cool microcode bugs (you could lock a Transputer up for minutes by executing a bit-shift instruction with a very large count).
- RCA 1802. Another early microprocessor. Not really a failure, but not a huge success, except for its radiation-hardened variants, which have flown on many space missions.
- Just about any floating-point coprocessor chip. Ugh. Thank goodness the early Pentiums stopped that madness [1].
[1] I heard that Intel wanted to charge for doing floating-point operations on Pentiums. The idea was that you'd pay to have an "enable" fuse blown, which would purchase (say) 100M operations before a disable fuse got blown. There'd be something like 64 fuses, and the last one would enable floating ops forever. The rationale was that the only people who really cared about floating point were folks running spreadsheets . . . and those people obviously had money and would pay for performance! People hated this idea; then 3D gaming started to be popular (with Quake, et al) and suddenly flops were in general use and it would have been a really awful marketing move by Intel.
> Thank goodness the early Pentiums stopped that madness
The idea of a separate co-processor that only did floating point ended with the 387. The 487SX (that was the coprocessor for the 486SX) was actually a 487DX (i.e. a full 486, with an FPU) that took over the system entirely and disabled the 486SX. But that whole scheme was still its own special kind of madness.
The Encore Multimax used the NS32032- late 80s students of WPI are familiar with with this machine because it replaced the Decsystem-20 as the student mainframe.
Encore did make one product which had fairly wide use: their Annex terminal servers- sold to Xylogics.
Some of the ideas from the Transputer can be seen in the more modern xMOS chips, though they use xC which is a C derivative adding many of the features Occam had. David May, the architect of the Transputer was also the founder of xMOS.
The Mill faces the same compiler problems that Itanium and other VLIWs have faced. There's only so much available ILP to be extracted and then even that's hard to do. It turns out that the techniques that Fisher developed, trace scheduling [1] and its successors, work equally well for superscalars thus bringing no net advantage to VLIWs.
The great wide hope lives on, somewhere in the future.
IDK, their register model "the belt" has a temporal addressing scheme that seems to lend itself well to software pipelining in a way that's a pain to extract with a standard register set.
Itanium did as well but then to no avail. It had direct support for modulo loop scheduling. Also, register renaming (which is temporal) is useful for software pipelining.
I think the Mill people should concentrate on what VLIW has had some success with in the past: embedded. There will be tears if they go after general purpose.
Compiler problems are not the sole reason Itanium failed, perhaps not even the primary one. They were not good initially, that's true. But Itanium was more killed by a combination of factors, and especially by AMD64 existing.
The first Itanium sucked, and had a very bad memory subsystem. Itanium 2 was pretty competitive in performance, with one major exception: the x86 emulation. Why buy Itanium—even if it's very fast—if you could buy cheaper AMD64 which supports your existing software? Add in that Itanium compilers were few in number, all proprietary (and expensive), and didn't generate very good code for the first few years. Also, Itanium violated some programmer assumptions (very weak memory model and accessing an undefined value could cause an exception).
Now, Mill has a better way of extracting ILP than Itanium does, compiler technology is much better, and JIT compilers are very common. VLIW processors can be very good JIT targets. Mill, if it ever materializes, has enormous potential.
Obviously I agree about Itanium vs AMD64. That's a one punch knockout. It was freaking brilliant of AMD.
However the Mill doesn't extract ILP. Compilers do that. And given a code base there's only so much available. Yes, compiler technology is much better now and still there's only so much ILP available.
Lastly, VLIW has been tried with JITs at least twice: Transmeta Crusoe and IBM Latte. VLIW code generation is hard and it's harder if you have very little time which is the nature of JIT.
Denver is not a VLIW; it's 7-way superscalar [1]. Haswell is 8-way. Wide superscalar is really common and has a lot of the advantages of VLIW without the impossible compiler headaches.
Haswell is generally considered 4-wide [1]. As far as I heard, Denver really is VLIW. I don't think there are many in-order [2] CPUs that are that wide. I think the article is using superscalar loosely (as in 'can execute more than operation per cycle' which is true for VLIW, although they are techinically all part of the same instruction).
[1] apparently it can sustain 5 instructions per clock given some very specific instruction mix.
[2] or out of order, really, Power8 being the exception.
It isn't an OoO, at least, not a 224 window OoO. 7-wide can simply mean that it has the decode and execution resources of 7 per cycle. It's a tablet core, they don't have a 7-wide OoO core in there.
I was at the EE380 talk :). It's amazing how few people show up to EE380 these days.
Yeah, Denver is in-order superscalar and it's a JIT but it isn't VLIW. Sad to say, they've tried JITing to in-order superscalar as well. They had a design win with the Nexus but even now NVidia is switching over to RISC-V for Falcon.
It is not from lack of effort that the JIT approach hasn't really worked. It's competitive but not outstanding. Denver, from the EE380 talk, thrives on bad bloated code. It's not so good on good code. This is not a winning combination.
Well, Falcon is intended to be a controller, not a (relatively) high performance core, so that's not an apples to apples comparison. If it's not VLIW, then what is it? In-order superscalar? You mean superpipelined or scoreboarded (like an ancient Cray?)?
Bad bloated code is 90%+ of the code in the world ;)
Also, shoot me an email at sixsamuraisoldier@gmail.com you seem to have some inside info on Denver, I'd love to chat (don't worry, I won't steal any secrets ;) )
This. People fail to understand the Itanium wasn't necessarily the only representative for VLIW.
Better compiler tech in the past few years (you mentioned JIT for example, which Denver has adapted to do quite well) has made VLIW a strong technical contender in several markets. Alas, the overhead cost of OoO is no longer the issue in modern computer architecture.
Denver is a JIT but the microarchitecture is 7-way superscalar [1]. A lot of the Transmeta people ended up on Denver and I'll guess they didn't want to repeat VLIW.
The Denver 2.0 CPU is a seven-way superscalar processor supporting the ARM v8 instruction set and implements an improved dynamic code optimization algorithm and additional low-power retention states for better energy efficiency.
Note that I actually think Itanium is somewhat architecturally interesting and that VLIW is both simultaneously over and under estimated/appreciated. I wanted to just highlight the example of a chip that definitely shook the world, and it didn't even have to succeed to do it!
Regarding the Mill: I still haven't heard a (good) answer for the perennial question: Where's the ILP?
Every time I ask this question, I'm pointed to the "phasing" talk. The issue is:
1) It's quite similar to a skewed pipeline (which Denver already has) in which you "statically schedule across branches" and delay the start until it's inputs are ready. Now, Denver did quite well indeed, but it's hardly a replacement for OoO.
2) Even from the examples that they show, it's clear that this is no where near 3x the ILP, at best, even from their own example, it provides 2x.
I hope the Mill can do well in embedded because they have quite a few good ideas (albeit many of which have already been done which they claim is their own, such as XOR-ing addresses, done by Samhita), and because the computer architecture industry is currently starved for innovation.
If so, it's on pseudo-vectorial instructions made by concatenating many opcodes on the same instruction word.
I don't think they claim it performs better than OoO, just that it uses less power.
Nah, it's late here, and I'm already saying stupid things. I was talking about the mechanism that gives parallelism to the CPU, while you are talking about intrinsic level of parallelism on the programs.
Just dismiss it. I was even going to delete it, but I took too long.
Itanium was just bad execution. They made the classic "rewrite everything and promise parity from day 1" mistake. When you spin off a new line you need time for it to hit maturity, and that usually takes several years. Look at Microsoft Windows, they ran the hacked up Windows 95 version of the OS against the more pure version in the form of NT for about 5 years before they were able to fold the features of the mass market client OS into it and end up with only one major kernel line for development. Similarly, Itanium did end up getting significantly better but it took years, and in the meantime every other architecture also got better so it wasn't exactly hugely superior anymore.
When a new technology or system is oversold and overhyped then underperforms (which is almost inevitable because of product maturity issues) people tend to respond sharply negatively.
The huge disadvantages of Itanium's style of compile-time VLIW, as they were explained to me, were:
1. There is absolutely no flexibility in scheduling. Any EU stalls due to memory access delays, non-constant-time instructions, or the like delays completion of the entire instruction word. This makes cache misses disastrous.
2. If a newer CPU makes improvements to the architecture (say, adding new EUs), programs cannot take advantage of those improvements until compilers are updated to support those improvements, and the programs are rebuilt with the new compiler. This is unlike typical superscalar architectures, where a new EU will be used automatically with no changes needed to the program.
As an overarching critique of VLIW, this isn't that bad, but it does ignore a few things:
1. It's not strictly true that there is "absolutely no flexibility in scheduling", there are a few techniques that can at least mitigate this issue, such as Itanium's advanced loads and a few others. On the aggregate though, the lack of MLP is absolutely what killed VLIW, especially nowadays.
2. This is true, but IBM's style (which the Mill has adopted) of "specialization" can effectively negate this issue.
There's some truth to the fact that Itanium had terrible execution, but a number of attempts afterwards have been made as well, high end general purpose VLIW is just not the way to go.
Note the high end and general purpose, if you're talking about smartphone cores or supercomputing, then VLIW can actually perform quite well, and indeed Itanium enjoyed decent success in supercomputing and the Qualcomm Hexagon, albeit not really general purpose, finds itself in many processors doing solid work.
Yup, that's the other classic mistake, committing to a design based on theoretical rather than practical merits.
It's a tempting way to do things, it's very difficult and expensive to spin up and maintain dual development and production pipelines in parallel. But more often than not that's often the best way to go about things. Sometimes you make missteps that seemed sensible at the time. Sometimes you torpedo your whole business because you take too far a step back and you can't even survive long enough to let the new thing mature. Look at Intel, they've had multiple missteps that did tremendous damage to their business. They jumped on the netburst architecture bandwagon and that turned out to be a dead end. The Core architecture basically grew out of their Pentium Mobile work.
The Itanium is interesting, because it somehow managed to kill most high-end RISC architectures (MIPS, HPPA, Alpha) without even being on the market. At some point even Sun wanted to drop SPARC for Itanium.
Proof that those chips got us to where we are now, I used most of them during my Bachelor of Computer Eng. (Recently graduated).
Chips like the 555 are extremely cheap and reliable. You got to start with the basic if you want to create more complex designs after.
My collegue listed a few iconic parts:
- 8051 (first real MCU, how could it not be on the list?)
- MAXIM 232 (RS232 interface)
- 7805 (voltage regulator)
- TL431 (...)