Apple's Cyclone Microarchitecture Detailed

fidotron · on March 31, 2014

Apple almost certainly have a MacBook Air based on one of these chips at prototype stage.

There are many reasons in favour, including keeping negotiations with Intel over price interesting, but probably the main thing preventing them running with it would be an inability to manufacture the chips fast enough, which is a major concern in mobile land. This is why lots of Android devices exist in variations based on different chips as hedges by the device manufacturer on long term availability.

mx12 · on March 31, 2014

I don't think that Apple will completely get away from x86 for a long time. Attempting to emulate the x86 on an Arm would be terribly slow.

I do however, think that they will eventually include both arm and x86 processors in the Macbook Air. That way, backwards compatibility is preserved and low power apps can run on the arm. In the current Macbook Pros they have dynamic switching of GPUs, there's no reason they couldn't use the arm as a coprocessor or even run the full OS.

Here a few technical points:

* LLVM - You can compile once for both both architecture and then the JIT will take over compiling for the specific architecture (Take a look at llvm for the OpenGL pipe line in OSX)

Full screen app- When an app is full screen (if it's a low power app) then the x86 could sleep and the arm could switch to running the app.

App nap - If all x86 apps are a sleep, switch over the running the arm processor exclusively

*Saving State - It's possible to save the state of apps in OSX, a similar mechanism could be used to seamlessly migrated between processors.

This is pure speculation, but it is feasible. There would be many technical challenges that Apple would have to solve but the a capable. The advantage Apple has is that they have absolute control over both platforms.

klodolph · on March 31, 2014

> I don't think that Apple will completely get away from x86 for a long time. Attempting to emulate the x86 on an Arm would be terribly slow.

It's not you're emulating an entire operating system: the operating system (and many libraries) are native, but the application code is emulated. It's faster than you think, and Apple has already done it twice: once in 1994, and once in 2005 (exercise for the reader: try extrapolating).

Apple's applications would be 100% native long before the ARM version shipped. Some intensive tasks—text rendering, audio/video/image encoding and decoding, HTML layout, JavaScript—these would also be native on third-party apps, since you just have to write the right glue into the emulator. This would be a lot easier than the 2005 switch from PowerPC to x86, which involved emulating a system with 4x as many GPRs and the opposite endian: ARM has 2x the GPRs as x86-64.

Sure, a bunch of apps will see reduced performance. Some will break. But remember: Apple has only been on x86 for ten years. We had the same problems during the PowerPC->x86 transition: you had to wait about two years to get a version of Photoshop that ran on x86 + OS X.

I'm willing to bet that Apple has been testing OS X on ARM for years now.

ksk · on April 1, 2014

>but the application code is emulated. It's faster than you think, and Apple has already done it twice: once in 1994, and once in 2005 (exercise for the reader: try extrapolating).

Well in PowerPC->x86 transition x86 was the faster chip. The emulation cost was discounted by some amount. If you go from x86->ARM, ARM is the slower chip, so there's never going to be _improvement_ in performance compared to x86. I don't see why you're equating the two.

fidotron · on March 31, 2014

That is an interesting idea.

At one point I had a machine which allowed movement of data between apps running in Windows 3.1 on a DX4 and Risc OS on a StrongARM. I'm wondering how hard it would be with OSX (a far stricter OS than Win3.1 or Risc OS) to allow a sort of process level separation, where the OS and any ARM compatible processes are running on an ARM, but you keep an x86 around and spin it up for certain processes.

metrix · on March 31, 2014

This sounds like it is ripe for a Semi-custom chip from AMD, where they sell an ARM/X86 SOC

zaroth · on March 31, 2014

  I do however, think that they will eventually include both arm and x86
  processors in the Macbook Air. That way, backwards compatibility is
  preserved and low power apps can run on the arm. In the current Macbook
  Pros they have dynamic switching of GPUs, there's no reason they couldn't
  use the arm as a coprocessor or even run the full OS.

Battery life of the Macbook Air is already outstanding. I think I want something more than just better battery life for all that complexity, but eventually it will all be just "taken care of" by the toolchain, so why not?

I think the low hanging fruit would be running iOS apps native on the Macbook Air.

joakleaf · on March 31, 2014

No reason to JIT -- They already support fat binaries.

noahl · on March 31, 2014

I'm skeptical about moving existing apps seamlessly between x86 and ARM processors, because you'd need guarantees about process memory layout that I don't think any current compiler makes. Imagine the memory image of a process running on the ARM chip. It has some instructions and some data:

    |        Data          |         ARM insructions        |

You could certainly remove the ARM instructions and replace them with x86 instructions. However, the ARM instructions will have hard-coded certain offsets in the data buffer, like where to look for global variables. You would have to be sure that the x86 instructions had exactly the same offsets. For another issue, if the data buffer contains any function pointers, then the x86 and ARM functions had better start at exactly the same offsets. And if there are any alignment requirements that differ between x86 and ARM (I don't know if there are), then the data had better be aligned to the less permissive standard on both chips.

None of these problems are impossible to solve. They could be solved easily by adding a layer of indirection, at the cost of some speed, and then Apple could go back and do the real difficult-but-fast implementation later.

However, why would it? When its ARM cores are essentially desktop-class, there's no need to have an x86 chip other than compatibility with legacy code. Looking at Apple's history, it seems pretty clear that it likes to have full control of its own destiny, and designing its own chips is a logical part of that, so having its own architecture could be considered a strategic move too.

So given the difficulty of implementing it well, and assuming that Apple eventually wants to have exclusively Apple-designed ARM chips in all of its products, if I were in their shoes, I wouldn't bother to make switching work. I might have a product with both kinds of chips, but I would just have the x86 chip turn on for x86 apps, and off when there were no x86 apps running, and know that eventually those apps would go away. (And because I'm Apple, I have no problem pushing vendors to switch to ARM faster than they want to, so this won't be a long transition.)

However, an even cooler move would be to make LLVM IR the official binary representation of OS X, and compile it as part of the install step of a new program. That gives Apple several neat capabilities:

1) They can optimize code for the specific microarchitecture of your computer. Maybe not a huge deal, but nice

2) They can iterate on their microarchitecture without having to care about the ISA, because the ISA is an implementation detail. This is the technically correct thing that everyone should have done years ago (yes, I'm annoyed).

3) They can keep more secrets about their chips. It's obnoxious, but Apple would probably care about that.

So, there's my transition plan for Apple to move to its own chips. It probably has many holes, but the biggest one is still the question of what Apple gains from this. Intel still has the best fabs, and as long as that's true, there will be some advantage in sticking with them. Whether the advantage is big enough, I don't know. (And when it ends in a few years, then who knows?)

jws · on March 31, 2014

Older enough programmers will remember the DEC Vax to Alpha binary translators. When DEC produced the Alpha you could take existing Vax binaries, run them through a tool, and have a shiny new Alpha binary ready to go.¹

Given such a tool, which existed in 1992, it seems simple enough to do the recompile once on the first launch and cache it. Executable code is a vanishingly small bit of the disk use of an OS X machine.

Going forward, Apple has a long experience with fat binaries for architecture changes. 68k→PPC, PPC→IA32, IA32→x86-64. I don't think x86-64→ARM8 is anything more than a small bump in the road.

As far as shipping LLVM and letting the machines do the last step, that should make software developers uncomfortable. Recall that one of the reasons OpenBSD needs so much money² for their build farm is because they keep a lot of architectures going because bugs show up in the different backends. I know I want to have tested the exact stream of opcodes my customer is going to get.

␄

¹ I think there was also a MIPS to Alpha tool for people coming from that side.

² In the sense that some people think $20k/yr for electricity is a lot.

jey · on March 31, 2014

Yep, and Apple has already done dynamic binary translation once before, during the PPC to x86 switch.

BruceM · on April 1, 2014

And 68k to PPC.

wsc981 · on April 1, 2014

  Going forward, Apple has a long experience with fat binaries for architecture 
  changes. 68k→PPC, PPC→IA32, IA32→x86-64. I don't think x86-64→ARM8 is anything 
  more than a small bump in the road.

Using the lipo[0] tool provided as part of the Apple Developer tools, it's pretty easy for any developer to create a x86/ARM fat binary. Many iOS developers have used this technique to create libraries that work on both the iOS simulator as well as a iOS device.

[0]: http://ss64.com/osx/lipo.html

pjmlp · on April 1, 2014

> As far as shipping LLVM and letting the machines do the last step, that should make software developers uncomfortable.

Why? This is how Windows Phone 8 works (MDIL) and Android will as well (ART).

On WP 8 case, MDIL are ARM/x86 binaries just with symbolic names left in the executable. The symbolic names are resolved into memory addresses at installation time, by a simplified on device linker.

Android's ART, already made default on the public development tree, compiles dex to machine code on device installation.

theatrus2 · on April 1, 2014

This was also the entire premise of the Transmeta CPUs

ajtulloch · on March 31, 2014

> However, an even cooler move would be to make LLVM IR the official binary representation of OS X...

It's worth revisiting http://lists.cs.uiuc.edu/pipermail/llvmdev/2011-October/0437... ("LLVM IR is a compiler IR"), from a core LLVM developer, explaining why LLVM IR is unsuitable for this task.

noahl · on March 31, 2014

Ah, thanks for posting the email. You're right. :-)

revscat · on March 31, 2014

> However, an even cooler move would be to make LLVM IR the official binary representation of OS X, and compile it as part of the install step of a new program.

I've wondered the same thing. In this respect the IR is analogous to Java bytecode, or C# CLI. In this context these share conceptual similarities, allowing for multiple languages to target the same runtime.

This would possibly open up iOS to being able to be more easily targeted by languages-that-aren't-Objective-C. As long as it compiles down to LLVM IR then this "binary" becomes language agnostic. (Actually, for all I know things like RubyMotion do this today. I haven't delved into it to find out.)

ahomescu1 · on March 31, 2014

> I'm skeptical about moving existing apps seamlessly between x86 and ARM processors, because you'd need guarantees about process memory layout that I don't think any current compiler makes. Imagine the memory image of a process running on the ARM chip. It has some instructions and some data:

There was a paper at ASPLOS 2012 where they did something like this, but for ARM+MIPS [1]. Each program would have identical ARM and MIPS code (which took some effort), with identical data layout.

1 - http://cseweb.ucsd.edu/users/tullsen/asplos2012.pdf

zanny · on April 1, 2014

> However, an even cooler move would be to make LLVM IR the official binary representation of OS X

The IR isn't architecture portable right now. IE, you can't use it as a live interpreter language, because the code it produces make assumptions on the target architecture before final binary translation.

It would be fantastic if Apple would fix LLVM so the IR was portable, it would be amazing for general purpose software if you could ship LLVM IR and have your end users compile it or have web services do it for target devices on demand.

tgma · on April 1, 2014

Google's Portable Native Client essentially does something similar: https://developers.google.com/native-client/dev/

reubenmorais · on March 31, 2014

>However, an even cooler move would be to make LLVM IR the official binary representation of OS X, and compile it as part of the install step of a new program.

So a user installing Firefox or Chrome or some other complex application would need to wait for tens of minutes before they can use their application? It's more likely they'll just re-use the existing dual arch architecture but instead of PPC/x86 it'll be x86/ARM…

Tloewald · on March 31, 2014

OpenStep supported four processor architectures before it got ported to PowerPC, and used to support "quad fat binaries" that would run seamlessly on all four architectures.

_pmf_ · on April 1, 2014

> I don't think that Apple will completely get away from x86 for a long time.

The switch from PPC to x86 was less painful than many thought. I don't see any particular reason why the "fat binary" approach (which is technically gross, but quite practical) would not work for switching from x86 to ARM.

Developers will hate it, but the whole app store farce shows what Apple thinks of developers.

legulere · on March 31, 2014

Nah, LLVM is to slow for JIT. However the final compilation step still could be done at install-time or link-time.

zimbatm · on March 31, 2014

Plus Apple's executable file format already supports bundling multiple binaries for each target arch. https://en.wikipedia.org/wiki/Universal_binary

graetzer · on April 1, 2014

I don't think they would nessesarily build in both processors. They'll just assume that developers will update their apps very fast.

The whole XCode toolchain already supports x86_64 and arm64. It's trivial to build "fat" binaries with clang that run on both platforms.

sliverstorm · on March 31, 2014

I would think porting OSX to ARM would be the real hurdle? Not undoable by any means, but the lion's share of the challenge.

mikeash · on March 31, 2014

The real challenge will be figuring out the compatibility question for third-party software. Do you make everybody recompile before their stuff works on the new hardware? Do you ship an emulator for x86-64 for a while, like they did when they went PPC->Intel?

OS X and its predecessors have already been ported to run on a bunch of architectures: x86-64, i386, PPC64, PPC32, PA-RISC, SPARC, and Motorola 68k. All, or nearly all, of the CPU-dependent parts of the OS are the parts that are shared between OS X and iOS. Making OS X run on ARM would be trivial for Apple, and I'd be shocked if they didn't have it up and running already as a hedge.

chipotle_coyote · on March 31, 2014

The emulation question is the main reason -- not the only reason, but the main reason -- I don't see Apple really moving in this direction with the Mac line any time soon. But I don't think it's just a matter of third-party software for OS X that's the issue; Windows compatibility is a pretty big deal. Apple doesn't push Macs as being great Windows machines currently (although they did for a while), but they're certainly aware of this use case and how it's been a minor but measurable plus for the Mac line. Even as much as Apple is known for burning down the old to make way for the new, I think they'd be pretty cautious with this one.

The notion that this CPU is really about giving ample power to the iOS line makes more sense to me. There will eventually be an "iPad Pro," whether or not it gets stuck with that moniker, and it won't have to worry about backward compatibility with anything other than previous iOS devices.

kevinchen · on March 31, 2014

I don't see low-end users (think entry-level MacBook Air) running virtualization or Boot Camp. The well-known Mac developers will recompile their apps as soon as Xcode is updated, and huge apps like Office will probably get a pre-release version beforehand. So a cheaper ARM MBA should be usable for most of it customers.

__david__ · on March 31, 2014

> I'd be shocked if they didn't have it up and running already as a hedge.

Especially because they were known to have OS X running on x86 as far back as 10.0, years before they actually switched to Intel officially.

gsnedders · on April 1, 2014

Rhapsody, the stepping-stone between OPENSTEP and OS X, had public releases on x86, while the PPC port was new.

rsynnott · on April 1, 2014

> I'd be shocked if they didn't have it up and running already as a hedge.

This seems very likely to me. They presumably have a fair bit of the work done with iOS, which shares a kernel codebase and so on, and really, it'd be worth the effort just to have it to show to Intel executives during negotiation.

endianswap · on March 31, 2014

That was my thought, too. It seems like if they do this then they'll be going down the same path as Microsoft did with the Surface RT and Surface Pro, right?

twoodfin · on March 31, 2014

Not exactly. Remember that the Surface RT deliberately restricted the applications that could be run via Win32 on the familiar desktop. Just about every third party application needed to adopt the new RT APIs and run "Metro"-style.

If they go down this road, Apple will not put constraints like that on their Mac developers: I'd expect that all the most commonly used frameworks would continue to be available, and porting most applications would amount to a recompile against a new Xcode.

rmrfrmrf · on April 1, 2014

I never once gave much thought to the idea that Apple would somehow shut out unsigned apps from its desktop computers, for the main reason that there would be too many existing apps that would suddenly lose functionality and cause a backlash.

With a port to ARM, though, Apple can simply say that the developers haven't ported their app to the new architecture yet, and that more apps will become available as developers catch up. Meanwhile, all those apps that currently require some kind of lower-level access could be banned.

On the other hand, I think that would be a huge boon for non-power users; it would be nearly impossible for malware to get onto the computer, and even annoyances like Adobe's and Microsoft's auto-updaters would finally get funneled though the App Store, which would prevent programs like that from constantly occupying memory/CPU/network.

For this to work, though, I think Apple would be wise to include some kind of developer mode feature (perhaps even forking over the $99/year fee) that would allow unsigned or potentially dangerous apps to run.

All of this (admittedly wild; sorry, coffee is kicking in) speculation definitely has me excited for the future of Apple, though! I haven't felt like that since Steve was around.

mikeash · on April 1, 2014

"All of this (admittedly wild; sorry, coffee is kicking in) speculation definitely has me excited for the future of Apple, though! I haven't felt like that since Steve was around."

Wacky. Your vision of shutting out unsigned apps has me looking for the exits. I've been an Apple user since the 80s and I'm pretty sure this would finally make me jump ship. Apple's current Big Brother trajectory frightens me to no end.

rmrfrmrf · on April 1, 2014

I'm pretty sure this won't change your mind, but my view is more-or-less "better the devil I know." Trading in Microsoft and Adobe's phone-homers for Apple's is fine in my book.

mikeash · on April 1, 2014

I don't use Microsoft or Adobe software as it stands. Getting rid of a couple of third-party updaters I don't even use in exchange for giving Apple complete control over whether or not I'm allowed to run something is an awful tradeoff. It's just barely tolerable on iOS because I use those devices more as appliances than computers, but it's not workable for me on a real computer.

mcculley · on March 31, 2014

I wouldn't imagine porting OS X to ARM would be the hurdle as most of it is already done in that iOS is most of OS X. Apple has full control of that source code base and probably already has done this, like they did during the transition to IA-32.

It would be the third party applications that would be a hurdle. Without a transparent layer like the deprecated Rosetta, they would not be able to run existing applications and would have to explain that a new MacBook Air can't run all of the old MacBook Air applications.

Maybe if they come out with an ARM powered laptop, they will call it something other than MacBook Air to avoid that confusion.

gonzo · on March 31, 2014

Just the Apple Air, I'd bet.

Don't forget that Apple has spent years getting everyone on Xcode, and disciplining them to avoid the parts that would make a recompile & test straight-forward.

They also have a "good enough" office suite (for many people), as well as the iWork apps.

Such a device could potentially also run iPad apps without much change.

wiml · on March 31, 2014

The machine-specific parts of OSX already run on ARM in the form of iOS.

Also remember Apple has already successfully made two or three architecture transitions already: m68k to PPC to i386 to x86_64.

justincormack · on March 31, 2014

And the (brief) PPC64 transition.

coldtea · on March 31, 2014

They've already ported it. iOS is OSX for ARM plus different UI stuff -- all the kernel, basic userland etc are there.

So, it's mainly a matter of porting the higher level stuff -- mainly the Cocoa libs, which are slightly different on iOS vs desktop OS X.

sliverstorm · on March 31, 2014

I guess I'm not familiar with just how much iOS and OSX shares.

rsynnott · on March 31, 2014

Given Apple's history with this stuff, there's a very good chance that that's already done. Intel was supported internally for years before the PPC -> Intel change.

cpeterso · on March 31, 2014

Jobs claimed that Apple maintained a secret x86 port of OS X for five years before the public release. A small cost.

leoc · on March 31, 2014

Apparently this is true enough, with the added qualification that for the first year or so it was largely secret from Apple too. :) http://www.theregister.co.uk/2012/06/11/apple_project_markla... http://www.quora.com/Apple-company/How-does-Apple-keep-secre... (Though of course the fact that the OS X codebase had been kept cleanly portable was neither one person's secret project nor a matter of chance.)

rsynnott · on April 1, 2014

Makes perfect sense, really; not only would it be a hedge against having to adopt Intel, but just the value of having it around to show IBM people when negotiating over the PPC would probably have been worth the cost.

gozzoo · on March 31, 2014

I don't think they will go that path. It would make much more sense to introduce a desktop version of iOS itself. iOS has already everything that's needed for a light desktop OS. It would provide much better experience than ChromeOS, there are already thousands of applications doing about anything and it would require very little effort to adapt iOS to run as a desktop OS and to adapt the existing applications. I think that a hint for that is the recent port of Microsoft Office. If users want more sophisticated OS they can just use 13" MacBook Pro. It's quite easy to guess that the next version will be thinner and lighter and probably cheaper - very similar to the current MacBook Air.

userbinator · on April 1, 2014

> but the lion's share of the challenge.

This would've been the perfect phrase last year.

lallysingh · on April 1, 2014

If they wanted to, sure. They have all the infrastructure ("fat binaries") from the ppc switch.

BugBrother · on March 31, 2014

The volumes for desktop/laptop are almost trivial compared to the volumes for handheld. (At least for Apple.) So making the processors in volume for a Macbook Air would probably not be a problem.

Also, Apple probably have little problem getting preferential treatment from manufacturers.

masklinn · on March 31, 2014

> The volumes for desktop/laptop are almost trivial compared to the volumes for handheld. (At least for Apple.)

Yep. IIRC Apple ships something like 10 million laptops/year. They ship 3 times that in phones… every quarter

glasshead969 · on March 31, 2014

Here is the LLVM Commit the article refers to.

https://github.com/llvm-mirror/llvm/blob/7b837d8c75f78fe55c9...

doe88 · on March 31, 2014

And for those interested the whole commit and corresponding discussion thread on the LLVM-dev mailing list:

http://lists.cs.uiuc.edu/pipermail/llvmdev/2014-March/thread...

https://github.com/llvm-mirror/llvm/commit/7b837d8c75f78fe55...

sanxiyn · on March 31, 2014

"Cyclone is a wide machine. It can decode, issue, execute and retire up to 6 instructions/micro-ops per clock."

This is truly amazing. In comparison, Cortex-A15 can issue 3 instructions per cycle, for example.

higherpurpose · on March 31, 2014

Nvidia's Denver is said to have 7. It's also rumored to have 2x the performance of Cortex A15. If it's true it could reach Sandy Bridge performance level either at first or 2nd gen (on FinFET 16nm) and Haswell/Broadwell level by 3rd gen (most people don't realize the performance between Haswell/Broadwell compared to Sandy Bridge is only like 15-20 percent, since Intel stopped focusing on performance). I also think in 2nd gen Denver SoC (2 process nodes from Tegra K1), Nvidia will have at least a 1 Tflops, possibly 1.2 Tflops GPU (Xbox One level).

ChuckMcM · on March 31, 2014

It is interesting that in the 'chip wars' everyone went to Intel for their chips (or Motorola) and that allowed a lot of R&D to be re-couped. But as Apple is baking their own chips, what is the market for Nvidia's Denver? It has to compete with Broadcom, Samsung, and others for a fraction of the Android device pool? Challenging to justify the levels of R&D that Apple is applying without a predictable return.

That said, I'm going to be watching for the Denver chip, it will be an interesting counter point to the A7.

glasshead969 · on March 31, 2014

By the time Denver based devices are on market, Apple will be releasing A8 based devices, so it will probably be competing with A8 rather than A7.

rsynnott · on April 1, 2014

> But as Apple is baking their own chips, what is the market for Nvidia's Denver?

Essentially, Qualcomm's market.

ris · on April 1, 2014

Never believe anything about unreleased NVidia SOCs

userbinator · on April 1, 2014

Keyword being "up to". There's a reason Intel hasn't really upped the issue width much, despite x86 having higher code density than ARM -- memory bandwidth. It's hard to keep wide execution cores saturated.

marcosdumay · on March 31, 2014

Whatching those Mill presentations just spoiled me...

jpwright · on April 1, 2014

Cortex A-15 can also run at nearly double the clock rate though: 2.5 GHz vs 1.4 GHz.

rsynnott · on April 1, 2014

> Cortex A-15 can also run at nearly double the clock rate

Cortex A-15 _does_ run at nearly double the clock rate (though with a substantially higher energy consumption on the rare occasion it actually scales up all the way). Apple is historically very conservative with its mobile device clock speeds, and I'd expect that Cyclone has a fair bit of headroom here.

jpwright · on April 1, 2014

I say "can" because most Cortex-A15 design implementations don't actually run that fast. Mostly 1.6-2 GHz. Of course, we are comparing a design that's licensed out from ARM to a fixed product from Apple, so it's not a totally fair comparison.

http://en.wikipedia.org/wiki/ARM_Cortex-A15#Systems_on_a_chi...

watersb · on March 31, 2014

Anand also taught us that an iPad draws about as much power as an 11-inch MacBook Air.

Configure an ARM platform with 8 GB of DDR3 DRAM and a PCIe-based SSD, and you may well blow the power budget versus the current Intel platform design.

Intel has turned its attention to total platform power: moving the VRMs on-die, and also identifying third-party motherboard components which draw silly amounts of power.

I think that Intel is on track here. Apple will of course push their design talent in order to deliver the best mobile devices they possibly can. But this does not necessarily mean a move away from Intel for a laptop or desktop.

tdicola · on March 31, 2014

Very cool analysis of the new chip. If I were Intel I would be more than a little concerned about what Apple is up to here. Would be very interesting to see what Cyclone can do in a proper laptop/desktop with more RAM.

gatehouse · on March 31, 2014

I'm impressed with Apple's work and it reaches all the way down to the user experience (I can play GTA San Andreas on my 5s), but the pace of Intel's research over the last 10 years has been absolutely demonic. I think for Apple to compete on that front would require an extreme investment.

For example see the second chart here: http://preshing.com/20120208/a-look-back-at-single-threaded-...

sanxiyn · on March 31, 2014

I guess Apple is plucking low hanging fruits, but speedup from Apple A6 to Apple A7 (spaced almost exactly a year) is about 50% in average, which corresponds to more than 2 years of speedup in recent Intel cores. I also note that this speedup was achieved without increasing clock, which is not the case for Intel.

CJefferson · on March 31, 2014

It is easy to get closer to your competitors when you start so far behind them.

gatehouse · on March 31, 2014

Intel released a P4 @ 3.73GHz in 2004, and the i7-4771 is "only" 3.9GHz in single core mode (i.e. turbo), so that's only a slight increase in clock in the last decade.

Sooner or later mobile processors are going to hit the power wall. I haven't looked into this lately but one thing that could provide a fixed and permanent benefit to mobile processors is the more modern instruction set.

sanxiyn · on March 31, 2014

Well, P4's clock was achieved with 31-stage pipeline, which was a bad idea. If you compare with what came after P4, there was more than slight increase in clock.

zokier · on March 31, 2014

Willamette and Northwood (ie the original P4s) cores had 20 stage pipeline, compared to Nehalem with 24 stages and Sandy, Ivy and Haswell having 19 stages.

And those Northwood cores reached 3.4GHz, compared to 3.8GHz of the infamous Prescott with it's 31 stages. That is fairly meager clockspeed increase, and based on that I'd argue that the extreme pipeline depth was not the major contributing factor in pushing the clockspeed of P4 higher.

raverbashing · on March 31, 2014

Except the P4 was a very inefficient architecture which relied on clock speeds (and the power of Marketing) rather than actual computing power

Especially the first P4s, those were awful. A Tualatin P3 @ 1.2/1.3GHz would be much faster than a P4 @ 1.4 or even 1.6

fidotron · on March 31, 2014

The only reason for Intel's R&D spend over that time is that mobile chips are even more terrifying, especially once you factor in the compute capacity on the GPU.

The memory bottleneck is the real problem, but once you start getting systems conceptually related to nVidia's Tegra K1 (or even the PS4 system architecture) where the CPU is relegated to a sort of system co-ordinator Intel are going to have a major headache and find themselves in that single threaded performance niche that kept the SPARC and POWERs of the world occupied prior to their decline.

prewett · on March 31, 2014

spendING. "spend" is a verb, "spending" is a noun.

(Yes, I know marketing says "ad spend" but it's still wrong)

goldenkey · on March 31, 2014

You shouldn't even get a reply considering that you're comment is pedantic and prickly. But you're flat out wrong. The worst combination for a bafoon.

nouninformal noun: spend; plural noun: spends 1. an amount of money paid for a particular purpose or over a particular period of time. "the average spend at the cafe is about $10 a head"

te_chris · on April 1, 2014

*"buffoon". I'm sorry.

goldenkey · on April 1, 2014

Thank you, I needed that.

coolnow · on April 1, 2014

*your comment

I'm sorry.

userbinator · on April 1, 2014

I would be very interested to see some ARM results in that chart in the future; at the moment, all the RISCs there are pretty dismal.

Not so long ago I remember doing some calculations on the SPEC results, normalising them to a single thread and clock frequency to determine the per-cycle efficiency of various CPUs, and x86 came out at least a factor of 2-3x ahead of the rest.

jobu · on March 31, 2014

It really seems like Apple is ahead of the curve on mobile R&D. I wonder if they'll ever consider reselling some of these chips. It's unlikely Steve Jobs would've ever done it, but I'm not so sure about Tim Cook.

kevinchen · on March 31, 2014

Apple bought PA Semi so they could be in charge of chip design and not have to share with anyone else. It's a competitive advantage, and Tim Cook isn't stupid. He won't license away a crown jewel for a few short-term dollars.

rsynnott · on April 1, 2014

People mightn't want to buy them if they did. These chips are _big_, which means that they're expensive to make. Apple can get away with it because their only costs are manufacture and licensing, and because they have high margins which can absorb a bit of a hit. If they were selling them, though, they'd presumably want to make a profit, and that profit would make the end product almost certainly the most expensive mobile chip on the market.

From a marketing point of view, too, it'd be a hard sell in Android-land. Apple has been very careful to steer clear of spec-oriented marketing, but can you imagine the less-sophisticated enthusiast market's response to, say, the Galaxy S6 using a dual-core 1.3GHz chip instead of a quad-core 2GHz?

jobu · on April 1, 2014

What about something for the server market? I know some companies have been switching to ARM chips as a low-power alternative in large server farms. It seems a 64-bit chip might be a good fit in that market.

rmrfrmrf · on April 1, 2014

IMO it's unlikely. It seems like Apple has to go through a lot to meet their own manufacturing needs, so ramping up production just for third parties would put even more strain on the already-large demand.

Plus, from a market standpoint, Cyclone was developed to meet Apple's own needs. To sell these chips would introduce a "demand" variable into the equation that I think would stifle development.

At any rate, this is a really interesting topic because I think that Apple's oft-critized isolation actually worked much to its own benefit here.

chucknelson · on March 31, 2014

Pretty impressive that Apple is capable of such CPU disruption with just a few small acquisitions. Is PA Semi the main reason for this, or could Apple have been building up a CPU design team for years before that acquisition?

bryanlarsen · on March 31, 2014

Is a $270 million acquisition a small acquisition these days? Not to mention the fact that they acquired a team responsible for two large CPU disruptions: DEC Alpha and StrongARM. A third disruption is perhaps not so surprising.

chucknelson · on March 31, 2014

Just seems like billion is the new million when acquisitions are concerned, but yes, I guess it wasn't that small overall. Did not realize PA Semi was founded by a lead designer at DEC. Thanks for the info, and Wikipedia filled in the rest for me :)

rch · on March 31, 2014

Didn't some of the Transmeta people end up at PA Semi too?

ddebernardy · on March 31, 2014

If memory serves, Cook mentioned that Apple's semiconductor division was larger than Intel during one of the last investor conference calls.

sliverstorm · on March 31, 2014

What? Apple has 80,000 employees. Intel has 105,000. So I doubt any Apple division is larger than Intel, if you catch my drift.

MrVitaliy · on March 31, 2014

I doubt all 105k Intel employees are hardware engineers. Half of them are probably sales.

Theodores · on March 31, 2014

Meanwhile ARM is just a couple of (thousand) guys in Cambridge, drinking tea and talking about the weather.

qq66 · on March 31, 2014

52% of Apple employees are in Apple Stores through the retail division.

sliverstorm · on March 31, 2014

He said 'bigger than Intel', not 'bigger than Intel's same department'

stonemetal · on March 31, 2014

There's little benefit in going substantially wider than Cyclone, but there's still a ton of room to improve performance.

Didn't AMD, and Intel more or less say the same thing about 3 wide? No real benefit from going wider. Is that because of the differences in micro architecture or is it more about getting a little more performance without having to ramp the clock speed? What makes 6 wide good for ARM but not x86 or x64?

fournm · on March 31, 2014

This is probably completely and utterly wrong as it's just a guess, but potentially the Thumb [1] instructions (small, limited subset of shorter instructions in ARM) might allow for a wider setup. Thumb instructions make a bunch of simplifying assumptions that might remove some of the issues with going wider. Not that I have any idea if Cyclone even supports them in the first place.

[1] http://en.wikipedia.org/wiki/ARM_architecture#Thumb

wtallis · on March 31, 2014

The biggest wart on the ARM instruction set was the predicate attached to basically every instruction. That's gone in the 64-bit ARM ISA, so there's no need for a 64-bit Thumb (and there isn't one).

leejoramo · on March 31, 2014

On balance, I think that Apple would use most of the energy savings from reducing to 20nm for other things: longer life and lower weight for the battery, improvements in the camera, wireless, increased RAM.

antimagic · on March 31, 2014

In the iPhone definitely. For the iPad, it already has more than adequate battery life (most people get several days use out of one before needing to recharge), so the extra clock speed could be used to handle multiple apps on-screen simultaneously (for example).

JTenerife · on April 1, 2014

Good on Apple! That's the best that can happen to us customers. Even for an Android / Windows guy like me :-).

szatkus · on March 31, 2014

I wonder if Nvidia Denver could match with this chip.

NextUserName · on March 31, 2014

Congrats Apple for all your achievements. It is amazing what can be accomplished when you employ Chinese Engineers and manufacturers who's tech espionage (they steal trade secrets) is one of their best known traits/assets.

axman6 · on April 1, 2014

Proof? Even informed speculation?

Apple have made several aquisitions in the last few years to give them the technology they need to make these sorts of developments. Even if some of the tech had been stolen, it would require a lot of work to put it into practice.

I know I'm feeding the troll, but I couldn't help it; this is just too ridiculous.

towski · on March 31, 2014

What is a tock?

stephencanon · on March 31, 2014

LMGTFY: http://en.wikipedia.org/wiki/Intel_Tick-Tock =)

CodeWithCoffee · on March 31, 2014

Apple also uses a similar strategy with the iPhone. The iPhone 4 and 5 both made major form changes (retina display and screen shape) whereas the 4S and 5S were focussed on refinement (faster CPU, 64-bit CPU respectively). It has also been noted (possibly on the Accidental Tech Podcast, or somewhere similar) that a 'tick' year for the phone tends to be a 'tock' year for iOS and vice versa; iOS 5 and iOS 7 (released with the 4S and 5S respectively) were bigger changes than iOS 4 and 6.

raverbashing · on March 31, 2014

I am thinking that this is looking up to be what the G5 aimed to be.

Unfortunately the ARM architecture, even with those optimizations is probably slower "clock by clock" compared with x86/PPC

But yes, I think this is something Apple is probably testing (Desktop Mac OSX on ARM)