Hacker News new | past | comments | ask | show | jobs | submit login
Intel Broadwell Architecture Preview: A Glimpse into Core M (anandtech.com)
115 points by ismavis on Aug 11, 2014 | hide | past | favorite | 54 comments



14 nanometer process. I vaguely recall how impossible people said that would be to achieve back when the original pentium was an 800 nm process.

very cool.


Still, it looks like 5 nanometers is the end of it. http://en.wikipedia.org/wiki/5_nanometer

Not that it will necessarily be the absolute end of Moore's law, as hardware manufacturers are trying alternative approaches to keep ramping up power, e.g. Samsung already sells its 850 Pro series SSDs http://www.amazon.com/s/ref=nb_sb_ss_c_0_6?url=search-alias%... made with its VNAND memory http://www.samsung.com/global/business/semiconductor/html/pr... which fell back to 40 nm from 840 EVO's 19 nm, while going 3D, which seemed to improve both speed and reliability. Now they have a bit more runway in their Moore's law, but still not much in sight.


Yeah, a transistor out of 4 atoms? wow.

The only thing i'm sure of, when Moore's law ends we'll be spending a lot more time with Amdahl's law.


Actually we have been spending a lot time with Amdahl's law already. Software people assume it only applies to parallerizing software. In "computer organization & design" By Patterson and Hennessy it is stated in the general form. And example is used how much multiply unit should be sped up to get five fold improvement in execution time when 80% of time is spend in multiplication?

Execution time after improved=

(execution time affected by improvement / amount of improved)+Execution time unaffected.


V-nand is probably not a solution to scaling flash , according to memory experts :

http://thememoryguy.com/comparing-samsung-v-nand-to-micron-1...


I hear this end to moores law all the time but couldn't we just make the CPUs physically larger?

I know this probably wouldn't work for a mobile device but for desktops/servers theres a ton of room for larger dimension chips right?


(Disclaimer: I'm not an EE)

To be technical, Moore's law is about the number of transistors on an integrated circuit.

So your point isn't too far off the 3D comments elsewhere.

Simply making the die bigger doesn't get you much: larger dies (without additional redundancy) have lower yields (as you're more likely to have a defect given a constant defect/area rate) and fewer can be stamped out of a standard sized wafer.

However, if you carry that idea to its logical conclusion... we may turn from shrinking the transistor to shrinking the packaging as the path of least resistance. 3D transistors, chip stacking (aka PoP), and through-silicon vias (aka vertical connectivity) all help get us more processing / area (while remaining within fundamental thermal, manufacturing, etc. physical limits).

Again, this is a CS major with an architecture interest, so anyone please feel free to correct me if I'm off-base.


There is plenty of room for optimisation in overall speed in classical computers, even with the looming end of Moore's Law. In my opinion HP have got the right idea... http://arstechnica.com/information-technology/2014/06/hp-pla...

Silicon photonics is likely to be a huge source of potential improvements. A former Intel SVP, Pat Gelsinger, was quoted as saying "Today, optics is a niche technology. Tomorrow, it's the mainstream of every chip that we build." http://en.wikipedia.org/wiki/Silicon_photonics


3D doesn't help very much. We're already at the limits of thermal capacity for consumer-chips. 3D just exacerbates that, since now you've lost a whole dimension you can shunt heat through.

If you go 3D, you need a very large drop in heat dissipation to keep your junction temperatures down.


The problem is if you make a CPU twice as large, with transistors have the same resistance, and power consumption as before you double the heat output and double the power input.

So yes you can. But if you do this for more then 26 months we'll end up with 600 watt CPU's.


Also your 1cm2 die becomes 2cm2, becomes 4cm2...you pretty quickly run out of space physically, and your interconnects get long enough that propagation delay becomes significant.


And to fix your propagation delays you introduce longer and longer pipelines... then you basically end up with Prescott.


Apart from the power considerations, that's very expensive. The cost of silicon grows more than linearly (I don't remember the exponent, someone help me out here...) in area, because of yield issues: the larger your chip, the more likely it is that an imperfection will cause it to be worthless. With very regular designs like DRAM and flash the reduce the impact of this by redundancy, but last I heard this wasn't really considered practical for logic designs like CPUs. You can sacrifice an entire one, although apparently you have to provide for that by making a voltage island round each one, which uses some area not to mention being a nasty analog complication, although I think those are coming in anyway due to the desire to reduce base power by switching unused blocks off.


GPUs have for generations been designed with the intention of being able to disable defective compute units; that's why we have more than 3 models from each manufacturer. It's not done quite as much in the CPU space, but Intel does sell some server CPUs that have some cores disabled, and AMD has even sold 3-core CPUs that didn't quite pass QA as 4-core parts. There's also variability in the amount of cache memory that gets enabled on parts made from the same die.

For CPUs, it's not done at any granularity finer than a whole core - nobody sells CPUs that might have one of the ALUs disabled.


Also K series CPUs (for overclockers) are chips with defective IOMMU part, small "xeon" E3 chips are Core chips with defective GPU part, and probably there is a lot of other similar binning going on with other parts.


GPU have only recently being used in holding data for which losing bits may trigger a disaster (financial, banking, system, etc. data). Before that it was just for holding vertices, shaders, models, textures, etc.


They can be made larger but that would reduce the manufacturer's yield. The bigger a single chip is, the more surface area you lose when you get an error.

Imagine a scratch that runs straight through the middle of several chips parallel to 2 out of the 4 sides and runs for 175 millimeters. If the chips are 50mm x 50mm each, you would lose at most 12,500 sq mm of silicon (3 scratched through and 2 with 12.5mm scratches = 5 * 2500 sq mm). If they are, say, 60mm x 60mm you could lose 14,400 sq mm of silicon from the same error (2 scratched through and 2 with 12.5 mm scratches = 4 x 3600 sq mm).


IBM's Z-series CPUs are pretty ridiculously enormous in this regard. The z196 is 512mm^2[0] (compare with about 260mm^2 for a top-of-the-line Intel Xeon), and is able to run at 5.2GHz on a 45nm process. Because IBM can basically charge anything they want for these monsters[1]; their yield of working chips per wafer can be pretty terrible and they'll still make money.

Intel, on the other hand, needs to be able to sell as many chips as possible per wafer, because they have vastly lower margins. This is also why they do stuff like fusing off dead cores and cache to produce working lower-end parts from dice that aren't 100% functional out of the factory.

[0] https://en.wikipedia.org/wiki/IBM_z196_%28microprocessor%29

[1] Locked-in customer base, service costs dwarf hardware costs, etc.


That's being done for Xeons and discrete GPUs; 700 sq. mm. chips are now a thing. It increases performance but doesn't help price/performance or power efficiency.


In addition to what has been said (power requirements/efficiency), signal propagation becomes a problem; you may not reach enough of the CPU die during a clock cycle.


Source? I don't think it's necessary for signal to propagate through the entire chip within a clock cycle.


It's not, and it does not on the fastest chips around.

But you'll be in quite a bad situation if it does not propagate through at least an entire ALU and register file within less than a cycle. Not an impossible situation, but a bad one.


Itaniums did this and it contributed to the expense. I don't think that "scales" for our priorities (power, cost).


You can still transition away from silicon lithography in addition to alternative transistor layout designs. Graphene is the most cited alternative, but there may be untested others, that just require rarer materials.


Please Intel, release it in 2014! Don't make it slip into 2015! I really really want a 10W NUC with Broadwell i7 as my main desktop machine!


It seems Intel's misleading marketing is already working, even on the HN crowd. The whole point of the "M" chip is to be low-power/low performance. If you want "Core i7", get a version that actually has a lot of power behind it, not a "10W i7". The 10w Core i7 won't be faster than a desktop Core i3 SNB. Heck I bet it won't be more than 50 percent faster than the latest ARM chip either.


I don't really care about super high performance - I need a small NUC that I can cool passively with reasonable performance for typical business stuff I do on the desktop. I understand the difference between mobile i5 vs i7 is almost none even on Haswell (both are dual-core with hyperthreading unlike desktop versions). I already have a 17W IvyBridge i7-3517U in one of my notebooks so I know what I am talking about. I also can't wait for 19W Kaveri for another tiny desktop setup...

EDIT: i7 details


There are excellent passively cooled cases for the HTPC crowd, that work very well for regular desktop use. Some can cool the real deal desktop Intel processors.

The performance is something else entirely from the super low power NUC stuff. Video editing and processing is suddenly not a problem.

I got the H1 from HD-Plex.com myself. It's specified to cool 55W but I run the 65W version of the i7 (so it's semi-low power) without any problems, not even at full load for extended periods of time.

It takes mini-ITX motherboards and stuffing it with 16 GB RAM was not a problem. It was a bit fiddly to everything together but the build quality is great.

No relation, just a satisfied customer.


I have a low power i3 I use in a nas freebsd box for the same reason.

The things a bit overpowered even for its use as is. I even started setting up vm's on the box to give it more to do.

I also love these low power cpus for desktops/home systems. When they use less power than a lightbulb, i'm all for it.


I just recently built and tested out a Kaveri based mini NUC scale computer. I used this case: www.newegg.com/Product/Product.aspx?Item=N82E16811129185 and it runs really swell. I even gave it a nice workout in kdenlive baking a multi-track video project I made on my i7 desktop - on my desktop it took 50 minutes, on the Kaveri it took 4 hours. But that is still within the realm of reason.

The part is 25w TDP, but there are passive heatsinks for it, but even with the fan I have it at 25% speed and its quieter than a mechanical hard drive.


I don't see why you need an i7. I have a passively-cooled Celeron NUC (Zotac CI320) that's fine for web browsing and 'productivity'. It gets hotter than I'd like though.


From Wikipedia, Intel also has quad-core + hyperthreading mobile CPUs, with model numbers 4700 and up. The top mobile i7 is about equivalent to a 4770S desktop chip, but with a bit lower TDP.

So don't lump all the mobile parts in with the ultrabook-optimised chips.


Sure, there is a 47W i7-4960HQ in rMBP which has significantly higher power consumption than what I am looking for. If you look at low power U versions of i5 and i7, they seem almost identical (with i7 having larger cache and a bit higher frequency; both are dual core and have hyperthreading). In desktop, you can see i5 is without HT contrary to i7.

If Broadwell can bring 20W quadcore i5/i7 - that would be even better! Currently only BayTrail Pentium J can offer that.


I have an 8 core server Atom which is around 20W. Single thread performance is slow but its pretty good overall.


Your use-case is not the same as everyone else's. It's not necessarily the same as anyone else's. Please don't presume otherwise, and then tell people that their use-cases are wrong.


Can anyone in the HN crowd shed light on the Intel 'S' suffixed chips? They're designed to a smaller thermal envelope, but I cannot find any description of how this works. Do they draw as much power as a non-suffixed chip until they start to heat up, and then limit themselves - or are they 'clocked down' for their entire range?

I've been trying to work out what spec I can power off a PicoPSU and this becomes important when the TDP is all I have to go on for power consumption figures.


The i7 low power part obviously is the premiere low-power part. And exactly like bitL I would like a nice i7 very-low-TDP, preferably passively cooled desktop machine, grossly overpowered for the vast majority of tasks that people do. Especially given that anything of consequence I do is on big fat machines with quad-Xeons sucking thousands of watts in a rack somewhere.


AKA: The chip that will finally let you buy a retina Macbook Air.


If the GPU is fast enough to drive a retina display.


Even the "hd graphics" or low end Kaveri GPUs from AMD can drive 4k.

Hell, Adreno from Qualcomm can probably drive 4k.

Desktops, compositing, and light effects like transparency are nothing next to video game or 3d model rendering. In practice, any GPU manufactured today is enough for non-3d purposes, unless you try running 3 4k monitors off the numberless hd graphics or something.

If you aren't getting 60 fps scrolling, it is usually due to the cpus being low powered and the application not using GL accelerated drawing for the scrolling.


Also, just because the GPU can power 4K doesn't mean it will for a reasonably small amount of power, such that the MBA can continue to pull a 12-hour battery life.


If you are plugged into a 4k monitor, you are probably reasonably close to a power outlet.


Theoretically, Thunderbolt is capable of being your power outlet.


Yeah but if you're powering your 4K monitor from Thunderbolt, I doubt you'd get that 12 hour battery life either :D


Why wouldn't it be?

The Macbook Pro Retinas already use the onboard Intel GPUs on all but the highest-end models, and they'll only be more powerful with Broadwell I suspect.


Well the current gen Haswell can drive a 4K display, and the Anandtech article suggested this GPU was the same but 'better' (more efficient and feature parity with DirectX 11_2) so I would guess that it can drive a retina display (which is nominally a 2K display on a 13" laptop)


But being able to attach a 4K monitor doesn't necessarily mean it can scroll a window at 60 fps.


The iPad 3 could refresh 2/3 of the pixels at 60fps using less power than the Macbook Air.


In practice, yes. In technicality, Apple probably could have done it already if they needed to.. Samsung has an ultrabook that's smaller than the 13" MacBook Air and rocks a 3200x1800 screen: http://www.samsung.com/us/computer/pcs/NP940X3G-K01US .. I've been tempted to get one to Hackintosh, but I just know it'd be flaky as all get out.


One thing I'm looking forward too more than the power saving advances are the arbitrary precision maths they are starting to finally add. Albeit just integer , add and multiply instructions for now, but a good start.


Cool, I hadn't heard of this. I couldn't find much info on a quick search, so here's Intel's reference: http://www.intel.com/content/dam/www/public/us/en/documents/...

MULX was introduced on Haswell as part of the BMI instructions. Basically it allows two explicit destinations (still an implied rdx source though).

ADCX/ADOX are pretty weird. They both operate basically the same as ADC, but only read/write the carry or overflow flag respectively. The idea is to allow two interleaved independent chains of adds, putting less pressure on the out-of-order logic on chip. This seems kinda weird though: the renaming logic would have to detect that the bit of the flags register that one chain of adds reads aren't written by the other chain, so those instructions wouldn't be hazards, and thus both chains can be executed in parallel. OTOH, IIRC there are other instructions that only write certain bits of the flags, so I guess the logic for this bit-level renaming of flags is already there.


Sounds like Intel went after a lot of one time optimization improvements for these low power chips, stuff that they aren't going to be able to be able to call on multiple times to improver performance. Maybe I'm wrong, but it seems like they are really pulling out the stops here.


I love ultra low power CPUs that still are able to drive a modern minimal Linux system, based mostly on CLI applications. I thought my Intel NUC was already great, but this might take it to the next level with a completely fanless design, yet cooler and very capable.


New Apple's CPUs are ready. Macbooks will be ready for a holiday season. Nothing to see here.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: