Intel's Ivy Bridge Architecture Exposed

Retric · on Sept 17, 2011

I kept reading this and wondering why there was going to be so little improvement when dropping down to 22nm. Turns out they are mostly just bumping the GPU which is useless to anyone with a discrete graphics card.

"Intel isn't disclosing the die split but there are more execution units this round (16 up from 12 in SNB) so it would appear as if the GPU occupies a greater percentage of the die than it did last generation. It's not near a 50/50 split yet, but it's continued indication that Intel is taking GPU performance seriously."

kalleboo · on Sept 18, 2011

> Turns out they are mostly just bumping the GPU which is useless to anyone with a discrete graphics card

I've kept my MacBook Pro locked to the integrated graphics card ever since I bought it, and never really missed the discrete graphics. I've maybe turned it on once or twice to try out WebGL benchmarks. OTOH I definitely would miss the extra 10% battery life.

eliben · on Sept 18, 2011

What percentage of users _use_ a discrete graphics card, though? Especially on ultra-light laptops?

mappu · on Sept 18, 2011

And what percentage of Ivy Bridge users are going to be using ultra-light laptops? It's a desktop part as well.

eliben · on Sept 18, 2011

It is, but I would guess the majority of desktop users are in offices and will be just fine with the on-core CPU (keep in mind it is pretty powerful - the current ones are on par with medium-low range discrete cards, and Ivy Bridge's are likely to be much better).

What really is the percentage of gamers and video editors that need the high-end GPUs?

zokier · on Sept 18, 2011

Saying that current integrated GPUs are on par with medium-low range is still generous. Based on the article, Ivy Bridge GPU is approx 60% faster than in Sandy Bridge. I'd estimate that it would it near the performance of Radeon 6450, which is the weakest discrete card in the series.

That said, I agree that with Ivy Bridge, the GPU performance will be good enough for a lot of uses.

Synaesthesia · on Sept 18, 2011

>I kept reading this and wondering why there was going to be so little improvement when dropping down to 22nm. The new tri-gate transistor design is a big deal. With it and 22nm, Intel are literally doubling performance per watt in Ivy Bridge. That's staggering.

hga · on Sept 19, 2011

Yeah, but with much the same microarchitecture there are limits to what they can do with that. The tick-tock concept is that they only try one hard thing at a time and moving to the new process node with the current microarchitecture is what's Ivy Bridge is about.

If you look at what Intel did with Sandy Bridge when they obviously took advantage of what they'd learned in moving to that process node in the previous step you'll get an idea of how what follows Ivy Bridge is likely to knock our socks off.

Hmmm, there's a further factor, economics: moving to a new process node is fantastically expensive and designing a new microarchitecture is not cheap. By limited how much the push the envelope in the former they can, if they get it right, allow the smaller dies with hopefully a high yield pay off the fab line investment.

hga · on Sept 17, 2011

The thing I found most interesting was "supervisory mode execution protection (SMEP)", AKA rings, available in 64 bit as well as 32 bit modes. This is something that AMD punted in their 64 bit macroarchitecture (which Intel was then forced to copy) and is sorely missed for various low level stuff. Unfortunately the long 64 bits interregnum without rings means it will be a long time before you can depend on others being able to run software depending on it.

marshray · on Sept 17, 2011

What's an example of something that depends on this?

hga · on Sept 17, 2011

I remember that some things depended upon it before AMD came out with their 64 bit architecture, but not what they are.

One thing I do know it's useful for is allowing your GC to run in an intermediate level of privilege between user and supervisor code. Also, if you do it right (i.e. Multics) a system call can be much cheaper because the supervisor isn't running in its own address space etc. Instead, your user level code calls a carefully vetted bit of system gateway code that has a foot in both rings.

bdonlan · on Sept 18, 2011

At least on Linux, there is no address space change when you make a syscall - entering ring 0 _grants access_ to the top half of the address space, where kernel code and data lies, but there's no change in page tables (with all the overhead that implies).

As for code having a foot in both rings, Linux has its VDSO (and Windows has ntdll) running at a user privilege level; I don't really see how having an additional intermediate level would help much.

hga · on Sept 18, 2011

I think the GC example is helpful, I know Azul uses it on their custom hardware and they've also said that they're very glad they decided to go with a standard operating system arrangement, no doubt because their main user program is Hotspot, a by now very gnarly monster of a C++ program.

The idea is that it's good to have something below you to pick up the pieces, and for fast/Pauseless GC it's good to give your GC code supervisor level access to stuff like the CPUs virtual memory (Pauseless maps and unmaps VM at a furious rate).

But you're otherwise correct in that to my knowledge no one has found another really good use of intermediate privilege. On Multics, after they got 8 rings in hardware, user code ran in ring 4, the email system ran in an intermediate ring but that wasn't critical and a very few people built systems with sandboxes in ring 5.

Ah, sandboxes could be another example. Wikipedia says the Google Native Client overhead (presumably on x86_32) is 5%, although I seem to remember a higher figure. But of course you need to be running on an operating system that uses rings and Linus didn't take advantage of the 386's ring system (which turned out to be very good for portability, as noted even to x86_64).

marshray · on Sept 18, 2011

I'm not the world's expert on this kind of thing but I don't think I've seen anything much more complicated than the basic 'supervisor mode bit' get much use in practice.

Even the venerable 286 (yes 2 not 386) had this four ring model and 'tasks' which continue to exist as vestigial organs of the architecture. Not used by anything I'm aware of.

hga · on Sept 18, 2011

Well, I think it's fair to mention that more than a few of us are appalled by how ... conservative the systems development community is. E.g. people persist in building systems that are less advanced than e.g. Multics and ITS (notice some of the comments on BeOS and IO). E.g. the biggest action here other than hypervisors (which are a pretty old concept, it's been a long time since an IBM OS could run on bare metal) and perhaps smartphones (an area I'm just not familiar with) is Linux, which ... foundationally in't very advanced. The base concepts are straight from the early '60s (pre-Multics).

This was true for languages as well; at least at the beginning of the last decade I noted that nothing seriously popular was based on concepts that had been developed any earlier than the '60s or maybe early '70s depending on how your scored OOP (Simula vs. say Smalltalk). Don't know if this applied to Ruby, though, or really even Python, languages with I don't know. Fortunately that Dark Age didn't last long (I suspect because the dot.com crash forced companies to "work smarter, not harder" because of resource constraints).

As for the 286, using it as anything more than fast 8086 was so painful, so crippled and so slow few bothered (OS/2 is the famous and famously not successful exception). Remember the hack with the keyboard controller to get back to real mode? If you didn't need to do that (mostly a device driver issue), it still was extremely expensive to switch segments in protected mode which were still limited to 64KB.

marshray · on Sept 18, 2011

people persist in building systems that are less advanced than e.g. Multics and ITS [and] hypervisors

I find it interesting that the VMware-style hypervisors were originally implemented not using, but in spite of all the virtualizaton features of the CPU. Yet today VM virtualization is considered the most reliable security boundary on shared hardware. No large security-conscious company would share user accounts on a Windows instance with untrusted parties, yet they will share virtual private servers.

I think what it says it that chip designers are lousy at developing what OS developers want and OS designers are lousy at developing what customers want.

286

Remember, this chip was designed before anyone realized that DOS (and DOS-based device drives) was going to take over the world. They never imagined anyone would want to switch from protected mode back to the obviously-inferior real mode. :-)

They were darn lucky IBM thought to put that auxiliary keyboard controller there to do a reset on the main CPU.

hga · on Sept 18, 2011

"I find it interesting that the VMware-style hypervisors were originally implemented not using, but in spite of all the virtualizaton features of the CPU."

Errr, I've not directly studied this but I've read that the x86_32 architecture is particularly hard to virtualize, and that VMWare does it by rewriting the binaries it runs and Xen obviously started out by paravirtualizing the hard stuff.

You're second point speaks more to how horrible Windows security is than anything else, I'd say. The security conscious are more willing to do that sort of sharing in the UNIX(TM) world, but of course it started out as a classic time sharing system. And then Linux at least got seriously hardened by a whole bunch of people, most notably the part of the NSA that does this sort of thing (it's no accident a lot of SELinux is very familiar to someone who knows and/or studied Multics).

But all that said, separate VMs raise very high walls between parts of a system that must be protected from each other. A system with very serious real world security requirements that I'd providing some advice on right now is using XCP and separate VMs to help achieve that. Of course it helps that nowadays we have CPU to burn (and with caches bigger than the main memory of any machine I used in the very early '80s) and that e.g. 2 4 GB sticks of fast DRAM (DDR3 1333) cost ~$80. So I suppose this is in part a "if you have it, why not use it?"

It's certainly the case "that chip designers are lousy at developing what OS developers want" (see the 286, AMD's dropping of rings, the 68000's inability to safely page ... although all of these examples at least have been or will be fixed). As for "OS designers are lousy at developing what customers want" ... very possibly. It's certainly a problem that developing a serious OS is for almost everyone a once in a lifetime effort (David Cutler was famous for doing 3 or so). There's also the curse of backwards comparability, which has also cursed the CPU designers.

But it's worse than that. It's hard to keep the original vision going, and counterwise sometimes part of that vision is wrong or rather becomes wrong as things change. E.g. we all thought the Windows registry was a great idea ... and then it got dreadfully abused (my favorite: using autogenerated 8.3 file names as values, making restores problematic). There are some great and I gather correct screeds about Linus refusing to create an HBI so that keeping existing device drivers from regressing is ... a very big problem. (Of course, that a big competitive advantage for Linux as well, but ... not a very nice one).

onan_barbarian · on Sept 18, 2011

Whoa: mov is a rename and doesn't take a uop. In the leadup to Sandy Bridge this seemed hinted at (the SB renamer handles 'zeroing' registers if you use the right cliche) but SB still didn't do movs as renames. The fact that Ivy Bridge does is pretty cool, and more applicable than a shift to 3-operand forms, as it won't need a recompile to run older code more quickly.

This is pretty cool, at least for those of us who care about this sort of nonsense. I think I have a performance-critical 105-operation loop somewhere that might shed about 8-10 pointless execution slots burned on movs...

ricw · on Sept 17, 2011

Interesting how the focus has shifted from performance / frequency to power consumption. I guess Intel is starting to get a bit of a scare of ARM's impending Cortex-A15 attack on the laptop space..

miratrix · on Sept 17, 2011

I think it's more that the "mainstream" computing has moved on from desktops to laptops.

Instead of targeting the latest and greatest design at 130W/90W TDP desktop parts and downsizing it to a laptop, they've turned it around and is starting to target 35W TDP as their main platform, then upping the power to turn it into a desktop part.

Since power consumption and heat dissipation is becoming an issue also at their top-of-the-line server market, it probably makes much more sense to design things that way. My guess is that high density server design at this point has more in common with laptop-esque power and heat constraints than those of, say, high-end workstations.

nextparadigms · on Sept 17, 2011

I hope that once we have quad core 2.5 Ghz Cortex A15 chips (probably next year), ARM will also start focusing on lowering power consumption after that, or at least split the product line in 2: one that continues to double up the performance every year, while maintaining the same TDP, like they do now, and one where they maintain the performance, but cut the power consumption in half every year.

That kind of performance should be more than enough for smartphones, and probably even tablets, though I could see how we might need more on clamshells. But by having a product line that focuses on lowering power consumption every year, while maintaining the performance will also ensure Intel will never catch-up with them in chips with extremely low energy consumption, and in the same time our smartphones will start lasting more and more.

wtallis · on Sept 18, 2011

ARM already has that strategy in place. They don't stop licensing the A8 core when the A9 MPCore hits the market. They're perfectly willing to let you produce an A8 on 22nm silicon if that's what provides the power/performance balance you want. You should take a look at the full list of cores offered by ARM; there's a lot you won't have heard of if you only pay attention to the flagship phones and tablets.

Someone · on Sept 18, 2011

", or at least split the product line in 2: one that continues to double up the performance every year, while maintaining the same TDP, like they do now, and one where they maintain the performance, but cut the power consumption in half every year."

That is more of a tactic than a strategy. After 3-odd years, there would be a 10 times performance difference between the 'fast' and the 'cool' lines. Chances are there is room between them.

Also, I think is more economical to produce old designs at new technologies for the 'cool' line than to design new 'cool' chips from scratch. That would likely get you 90% of the goal for 10% of the costs.

That 90% probably is sufficient, too, as CPU power usage typically is only a fraction of system power usage. I think that applies even to most embedded stuff for which an ARM is appropriate (ignoring on-chip audio and video decoding hardware)

tallanvor · on Sept 17, 2011

It's not just ARM. Power and cooling costs in server rooms and data centers is a huge consideration, and Intel has to respond to the needs of those customers as well.

zokier · on Sept 17, 2011

I think Intel realized at the end of Netburst era that pushing 150+W of power through small piece of silicon is not trivial anymore. So to have better performance they were forced to improve perf/watt, and that is exactly what they have done since.

acdha · on Sept 17, 2011

Definitely: for at least half a decade most of the HPC people I've worked with or for had power and cooling as the primary constraint, not space or budget. Things like switching to centralized DC (vs. in-server power supplies), getting disks out of cluster nodes, etc. but there was an obvious problem when a rack of servers was roughly comparable BTU-wise to a large gas grill running at maximum.

hga · on Sept 17, 2011

Yeah, buying your hardware is largely a one time capital cost, but power for it and for your cooling is a constant operating cost. Haven't I read somewhere that power is now the largest expense in the lifetime of a "supercomputer"?

acdha · on Sept 19, 2011

Forget operating costs, one major biological research center I knew of hit (IIRC) 20% of their maximum server space before the power company said they'd need to run a new high-voltage line at $$$ and years of delays even if the neighbors didn't kill it. I believe they were able to jump on the lower-power hardware as that market developed but it certainly got a lot of attention in the community.

hga · on Sept 19, 2011

Wow. I suppose they didn't want to move the bulk of their servers because of their sunk costs in space, security, etc. ... but, yeah, I can see that getting a lot of attention.

And you implicitly bring up a point I've read about, that power and cooling are now the limiting factors in server spaces. Moore's Law plus the even faster doubling of disk platter capacity (as of late I've read it's doubling every year or so) has allowed us to pack rather a lot in rather small spaces.

montecarl · on Sept 18, 2011

I think for most any computer that is at constant high CPU utilization power will cost more than the hardware over its lifetime. Unless you have very cheap power.

hga · on Sept 17, 2011

Remember also how they were burned, so to speak, by their Netburst Pentium 4 etc. "marchitecture"; power consumption can be crippling when you push up the speeds and the article comments on both in one sentence, "Ivy Bridge (IVB) is the first chip to use Intel's 22nm tri-gate transistors, which will help scale frequency and reduce power consumption."

I for one am very happy with the Sandy Bridge CPU speeds and price/preformance.

watmough · on Sept 17, 2011

Yes, quad-core Sandy Bridge is incredible, even on the low-end of the scale with the i52400.

It seems like the power-savings in Ivy Bridge will allow a further transition of this tech down to smaller form-factors.

I wonder if the non-premium laptop space will still remain stuck where it is now though? I'm not sure, 5 years on from 2006, what percentage of consumer laptops are significantly faster in use than my MacBook Core Duo.

hga · on Sept 17, 2011

The problem with something like non-premium laptops is that cost tends to be the biggest constraint, so they're not going to give you more than the market demands unless that's "free".

But who knows? You can't buy a Sandy Bridge Xeon chip that's less than 3 GHz.... (Although I forget the clock rate of the very low power OEM only part.)

watmough · on Sept 18, 2011

There's some Core 2 Duo-replacements in the 2 core i5s and 7s.

http://www.anandtech.com/show/3876/intels-core-2011-mobile-r...

Clocks are 2.2GHz to 2.7GHz, all hyper-threaded. Dual and quad-core.

Quite interesting stuff, though I might be annoyed if I bought an i7 laptop then discovered it wasn't a quad-core.

protomyth · on Sept 17, 2011

or the low-end server space or Apple getting grumpy...

cletus · on Sept 17, 2011

The evolution of technology has certainly been interesting. I remember a lecturer once told me that everything old is new again. Basically we go in cycles.

We started with mainframes, then mini-computers and then micro-computers (PCs/Macs). What happened in the last 10 years? Through the "cloud" we basically went back to large computers and timesharing again.

Another turning point in the last 10 years is that basically any PC made since 2000 will probably be sufficient for what most normal users want. We get increasingly powerful CPUs that most people just don't need.

This is part of the reason for the move to lower power consumption. A smaller form factor and lighter computer is something most people care about. Having 6 cores instead of 4 just isn't.

Many pundits have predicted that Web applications will take over. 5-10 years ago there was a reasonable basis for this opinion. Computers would get increasingly powerful and that headroom would make the otherwise horribly inefficient Javascript medium (compared to the compiled languages like C/C++ or even the bytecode languages like Java/C#) dominant as the inefficiencies become irrelevant.

The rise of native apps on mobile is another example of this cycle. Part of this is that native apps have access to libraries that Web pages simply don't but part of it really is performance and the fact that performance once again matters.

Personally I find the manufacturing of chips at 22nm to be simply amazing. When I started paying attention to this stuff IIRC the 386/486 were done on 500nm+ lithography.

I do wonder what the future holds because that number just can't physically get much smaller (with current lithographic techniques).

It's amazing how much power a small chip can get now. I have one of the latest Macbook Airs and it can decode 10 video streams without the fan coming on. A friend has the Core 2 Duo MBA (last year's) and his machine is dying under the same load.

rnemo · on Sept 17, 2011

"Another turning point in the last 10 years is that basically any PC made since 2000 will probably be sufficient for what most normal users want. We get increasingly powerful CPUs that most people just don't need."

That's being quite generous to the Pentium 4. Really the last 4-5 years is when acceptably good CPU power in any circumstance for normal users became the norm. The Core series of processors is what first gave us the sort of power that we've come to expect and Nehalem, Westmere, Sandy Bridge and now Ivy Bridge are all just improvements on that.

Also, according to cpu world, the original 80386 was done on on 1.5µm lithography.

acdha · on Sept 17, 2011

Depends on your software stack, too: the Athlon system I bought in 2000 had no trouble running a web browser with multiple windows while simultaneously compiling, transcoding DV, having email & IM open, etc. on BeOS.

We've gained in many areas but I think it took the hail-mary SSD migration to dodge the usability hit from poor I/O scheduling on Windows, OS X, Linux, etc. This is far from saying BeOS was perfect (e.g. networking was wretched, there was no pervasive color management or visual quality from the switch to GPU compositing) but rather that some of our memories are colored more by the software than underlying hardware.

zokier · on Sept 17, 2011

Just a note, but imho video decoding these days is not very good benchmark on performance, as it is usually handled by somewhat specialized hardware decoder instead the general purpose CPU (even if they both are integrated on the same physical chip).

0x12 · on Sept 18, 2011

Forgive me, but I'm not familiar with general purpose CPU's that 'usually' have a specialized hardware decoder on board for video decoding.

Which CPUs specifically do you mean?

Dear downvoters, thank you very much for showing that HN is a place where simply asking a question is already reason to get downvoted.

Symmetry · on Sept 18, 2011

I'll try to answer your question on why some people might have downvoted you. In spoken English we can use a wide variety of tones to indicate things like sarcasm or strong disagreement when asking a question. Translating this into written English isn't totally straightforward, but one common way of signalling these things is to use scare quotes[2]. In this way writing "Why do you say its usually like that" and "Why do you say its 'usually' like that?" have the same literal meaning but the later has a belligerent or sarcastic subtext. In general, if you don't want to come off as disagreeable you should never put quotes just around single words like usually, and especially not words or phrases that weren't actually used in the conversation like "off die" when questioning them. [2] http://en.wikipedia.org/wiki/Scare_quotes

0x12 · on Sept 18, 2011

Because I'm quoting that specific word from the original text as the one that confused me.

As for the differences between written/spoken English, I'm a non-native speaker/writer with plenty of experience but stuff like this will probably forever delude me.

Symmetry · on Sept 18, 2011

GPUs have included specialized video decoder hardware for the last several generations[1]. Now that the GPUs are being integrated with the CPUs, that capability is available there too.

[1]e.g http://en.wikipedia.org/wiki/Unified_Video_Decoder

0x12 · on Sept 18, 2011

Yes, I know about the decoders on video cards, I'm interested in the ones where it is 'common' (or 'usually' as the OP uses it) to see them integrated with regular CPUs in the same package but 'off die'.

Symmetry · on Sept 18, 2011

Basically all non-netbook laptops sold within the last 6 months have a CPU that has on-die video decoding, but many (though still a minority) desktops don't. I'm not sure why you're bringing MCMs into the discussion, those were the norm in Intel laptops for about a year before that with the Nehalem chips in laptops, but right now the only MCMs being sold are server chips which don't have GPUs at all, but rather require more silicon than can economically be produced in one piece.

jensnockert · on Sept 18, 2011

All Intel Sandy Bridge processors with an enabled GPU have them, also all AMD Fusion APU processors.

0x12 · on Sept 18, 2011

Thank you.

xiaoma · on Sept 18, 2011

>Another turning point in the last 10 years is that basically any PC made since 2000 will probably be sufficient for what most normal users want. We get increasingly powerful CPUs that most people just don't need.

This is utterly ridiculous. My 2009 ASUS laptop struggles with Starcraft 2. I'm NOT happy with that. And it's not like I'm some extreme outlier playing FPS action games, either. I just want a to play a really popular, really mainstream game. The people who don't even want more power than a year 2000 PC are the old, the slow adopters and the boring.

zokier · on Sept 18, 2011

Gaming is often considered to lay outside "what normal users want". It's also an area where laptops are especially weak, and I'd guess that even in your case the GPU is more limiting factor than CPU. Reasonably built gaming desktop from 2009 would certainly run Starcraft 2.

nhebb · on Sept 18, 2011

> Another turning point in the last 10 years is that basically any PC made since 2000 will probably be sufficient for what most normal users want.

I bought a desktop PC back in 2001 that I used daily until early 2010. In those 9 years, I'd bought 5 other systems (laptops & desktops). I used the old system daily because it had all my "stuff" on it, it never felt slow for 90% of the work I was doing, and I had keyboard/mouse sharing with a newer system for the other 10%. I'm still amazed I used it for so long.

listic · on Sept 18, 2011

The march of lithographic manufacturing looks fascinating, considering we are already so close to atomic scale:

Van der Waals radius of the atom of silicon is 0.21 nm. So next year our transistors will be 100 atoms wide?