Really good read. I graduated from an ECE program (with a focus on VLSI and comp arch) in 2003, but went straight to software after that, so this was a great refresher for me, as well as explaining some of the newer stuff that hadn't been in use then. I especially liked how he explained how caches work, and the different types (direct-mapped vs. n-way associative). I remember it took me a while back in school to really grok caching, but the explanation here was clearer than anything I ever saw in class.
Overall I was most impressed by the discussions of the pros and cons of each of the designs and the various tradeoffs involved. Often the "why did they do it this way?" is left out.
Heh, perhaps I just read it faster because I have some familiarity already, but I got through it in about 45 minutes.
I'll add another vote for it. It's a well-done entry in the sparsely populated but useful genre of fairly-but-not-too-technical computer books. Less technical than a book on computer architecture intended for a computer engineer; but more technical than the almost insultingly introductory "how your computer works!" type of books.
Seems like a great summary, but it should probably be expanded to cover a few things that really matter these days but were beyond the scope of the article originally.
It really doesn't explain much that is relevant to how SMP is done, but there's been a lot of interesting architectural progress there: The point-to-point HyperTransport links and on-die memory controller were the two biggest advantages the Opteron initially had over contemporaneous Xeons that used a shared front side bus connected to a memory controller in the Northbridge, but it also meant bring the complexities of NUMA to mainstream systems. Intel's first Dual-Core CPUs were also just two P4s sharing a socket, which was less effective than later designs that that had shared L2 and L3 caches.
I'd also like to see a bit more about GPUs, as they use a different mix of techniques (many cores, in-order, but also VLIW) and have quite different memory and cache systems (eg. ring buses, directly controllable global/local/constant memory regions).
The thing that seems to me as missing from this (and most similar materials) is at least some discussion of how control logic for any of these RISC-like datapaths looks like in hardware. In my experience many people expect lot of complexity and magic in that and then are surprised by it being significantly simpler.
Skimmed it. It seems to be a summary of the contents of a graduate level course in computer architecture - which I think is a good thing. If that's a topic you're unfamiliar with, this looks like a good place to start.
> It seems to be a summary of the contents of a graduate level course in computer architecture
Graduate level? It was in my second year 'computer architecture' class, except in much more grueling detail. But I was a CE major, so that's that... (however, I recall I had many SE's and EE's in that class too).
I was a computer science student in undergrad and grad school. None of the CS undergrad curriculums I was involved with (either as a student or as a TA) had this level of architecture for undergrad CS majors.
PH or HP? My intro course used Computer Organization and Design, and the senior course used Computer Architecture: A Quantitative Approach (both are by the same authors).
Wow, didn't even realize the two authors had written two books on the subject. I meant the former and hadn't realized the parent of my post meant the latter, having not read the title after seeing the authors' names. Whoops!
Yea, any good intro to architecture should cover all this. I haven't yet taken any grad level architecture, but I suspect grad level is mostly "how do we actually do this stuff" instead of "here's what we do"
In my architecture class, we didn't get to a lot of this stuff (sans the caching). However, everything we did cover we implemented via circuit simulation software. You'd be surprised at how tricky some of the stuff can be. Although controllers are really simply (at least, when you use microprogramming).
It was in my second year 'computer architecture' class, except in much more grueling detail.
How much more? There are quite a few topics here that weren't covered in my introductory architecture course (e.g. speculation, VLIW, register renaming), though they were in the senior-level one.
He means this is a summary of what they covered, and that the actual class simply spent a week or three on each subject. Obviously if you've got 3 weeks to talk about caches, you're going to learn more (details) about them than what can be written in a paragraph or two.
The article confuses "exponentially" and "quadratically":
scales roughly quadratically with the issue-width. That is, the dispatch logic of a 5-issue processor is twice as big as a 4-issue design, with 6-issue being 4 times as big, 7-issue 8 times and so on
Searching online, it seems that quadratically is correct; the explanation of "quadratically" is the mistake.
I'm not sure I really agree with everything in his "Brainiacs vs Speed demons" chart. For instance, Power7s generally have higher performance than thread than 'Cores i's, and have about the same ILP. And given that Power7 is 4-way SMP instead of 2-way SMP like the Intel processors I'm not sure why the 'Core i's are listed as being more brainaiacish.
Great article. At first it seems to match the computer architecture course, but at least my professor didn't give fine detail like that hyper-threading had a 10% logic overhead.
I like how a 500mhz ARM cpu is as fast as a 1.6ghz Atom
Those pages don't actually give any evidence of that, just marketing talk. From what I've seen, the current crop of tablet ARM CPUs (dual core, ~1GHz) gets close but doesn't beat a single-core Atom.
Okay, this was all in my undergrad computer architecture course from about 7 years ago. Of course we used a book co-authored by Patterson, so maybe that's why.
Overall I was most impressed by the discussions of the pros and cons of each of the designs and the various tradeoffs involved. Often the "why did they do it this way?" is left out.
Heh, perhaps I just read it faster because I have some familiarity already, but I got through it in about 45 minutes.