Hacker News new | past | comments | ask | show | jobs | submit login

Two reason for x86 (IMO) and 3 reasons for ARM.

I've thought for years that the overhead of the extra decoder hardware and legacy cruft was non-trivial (though Intel claims that's not true). The evolution of ARMv8 (where ARM went much closer to it's RISC roots) seems to disagree. This explains the performance per watt issue (and potentially some IPC).

That said, scaling IPC (instructions per clock) seems to have a pretty big limit. x86 has basically hit a wall and it's been lots of time and research for small gains. Additionally, the biggest challenges in large systems is that the cost to do a calculation on some piece of data is often less than the cost to move that data to and from the CPU. As Apple increases cache size, frequency, and starts dealing with bigger interconnect issues, I suspect we'll see a distinct damper on their performance gains.

Qualcomm (and ARM as the designers of the core) has a very different problem to solve. They can't make money off of software. They make money when they sell new chips and they make more money from new designs than from old ones. This means incremental changes to ensure a steady revenue stream. Since Apple having a fast, proprietary CPU doesn't actually affect Qualcomm or ARM, they most likely don't even see themselves as in direct competition. Most people buy Android or iOS phones for reasons other than peak CPU performance and Qualcomm is fairly competitive with a lot of these (esp actual power usage).

A further complication is that they also need "one design to rule them all". They can't afford to make many different designs, so they make one design that does everything. Apple doesn't need to spend loads of time and money trying to optimize the horrible aarch32 ISA. Instead, they spend all that time on their aarch64 work. ARM and Qualcomm however need to add that feature so the markets that want it still buy their chips.

Apple shipped their large 64-bit design only a couple years after the ISA was introduced. Put simply, that is impossible. It takes 4-5 years to make a new, high-performance design. It took ARM 3 years for their small design (basically upgrading the existing A9 to the new ISA) and closer to 4.5 years to actually ship their large design (A57) and another year for a "fixed design (A72, though it's actually a different design team and uarch). Though the gap has been closing, 2.5 years in the semicon business is an eternity.

A crufty ISA and non-CPU scaling problems seem to explain Intel/AMD. A late start, bigger market requirements, and perverse incentives against increasing performance seem to explain ARM/Qualcomm




Its hard to buy that the ISA really has anything to do with it. As you mention apple has a fairly narrow market target for their cores. Both intel and AMD are basically building server cores (can you say threading?) and selling them as client devices. Mostly because that is where the real money for them is. Apple OTOH is building a client/mobile core, and they benefit from a number of "features" that they enable, which are known performance problems in the desktop/etc space but continue for legacy reasons. Combined with Intel basically standing still for the last ~5 years, and the tables have reversed as far as who is ahead on process+microarch.

Basically a lot of apples advantages are:

1: Complete vertical control of compiler+OS+hardware 2: Plenty of margin to spend on extra die 3: More advanced process @ TMSC 4: Very narrow focus, apple has only a few models of iphone+ipad, where as intel has dozens of different dies they modify/sell into hundreds of product lines. So everything is a compromise.

Any of those four give them a pretty significant advantage, the fact that they benefit from all four cannot be discounted.


aarch32 decode is far less complex than x86 and aarch64 is even less complex than that. On the power consumption side, decoders definitely make a difference. They use tons of power and a huge number of instructions means having a huge, power-hungry instruction decode cache.

In addition, complex instruction decoding requires more decode stages. This isn't a trivial cost. Intel can shave off several stages if they have a decode cache hit and that's not including the ones that are required regardless (even the simple Jaguar core by AMD has 7+ decode stages possible). Whenever you have a branch miss, you get penalized. Fewer necessary decode stages reduces that penalty.


OTOH, you have x86 using what is effectively a compressed instruction encoding, and a trace cache (although its advantageous enough arm designs are apparently using them now too) which reduces the size of the icache for a given hit rate. So the arch losses a bit here, and gains a bit it elsewhere. Its the same thing with regard to TSO, a more relaxed memory model buys you a bit in single threaded contexts, but frequently TSO allows you to completely avoid locks/fencing in threaded workloads which are far more expensive.

So people have been making these arguments for years, frequently with myopic views. These days what seems to be consuming power on x86 are fatter vector units, higher clock rates, and more IO/Memory lanes/channels/etc. Those are things that can't be waved away with alternative ISAs.


If it were vectors, clocks, and memory, then Atom would have been a success, but even stripping out everything resulted in a chip (Medfield) that under-performed while using way too much power.

Either the engineers at Intel and AMD are bad at their job (not likely) or the ISA actually does matter.


Atom is a success, just not where you think it is. The latest ones are quite nice for their power profile and fit into a number of low end edge/embedded devices in the denverton product lines. Similarly the gemilake cores are not only in a lot of low end fairly decent products (pretty much all of Chuwi's product lines are N4100 https://www.chuwi.com/), but they are perfectly capable very low cost digital signage devices/etc.

So not as sexy as phones, but the power/perf profiles are very competitive with similar arm devices (A72). If you compare the power/perf profile of a denverton with a part like the solidrun MACCHIATObin the atom is way ahead.

Check out https://www.dfi.com/ for ideas where intel might be doing quite with those atom/etc devices.


Conversely, if the instruction set was the main factor, you'd expect Qualcomm and Samsung also to have ARM processors with a similar power to performance advantage over Intel chips.

The reality is just that Apple is ahead in chip design at the moment.


They are 2 years behind Apple and slowly catching up.

When Medfield came out, Apple didn't have it's own chip and x86 still lost. It was an entire 1.5 nodes smaller and only a bit faster than the A9 chips of the time (and only in single-core benches). The A15 released not too long after absolutely trounced it.


>When Medfield came out, Apple didn't have it's own chip

>It was an entire 1.5 nodes smaller and only a bit faster than the A9 chips of the time

You seem to have the chronology all mixed up here. Medfield came out in 2012. The A9 came out in 2015. Apple was already designing its own chips in 2012. (The A4 came out in 2010.)


> Apple shipped their large 64-bit design only a couple years after the ISA was introduced.

Actually ARM cores were available earlier than that, just nobody wanted to license them until the elephant in the room (Samsung) forced everybody to follow.


Timeline

2011 -- ARM announces 64-bit ISA

2012 -- ARM announces they are working on A53 and A57 and AMD annouces they'll be shipping Opteron A1100 in 2014.

2013 -- The Apple A7 ships doubling performance over ARM's A15 design.

2013 -- Qualcomm employee leaks that Apple's timeline floored them and their roadmap was "nowhere close to Apple's" (Qualcomm seems to switch to A57 design around here in desperation -- probably why the 810 was so disliked and terrible).

2014 -- Apple ships the A8 improving performance 25%.

early 2015 -- Samsung and Qualcomm devices ship with A57. Anandtech accurately describes it saying "Architecturally, the Cortex A57 is much like a tweaked Cortex A15 with 64-bit support." Unsurprisingly, the performance is very similar to A15.

late 2015 -- Apple ships A9 with a 70% boost in CPU performance.

later 2015 -- Qualcomm ships the custo 64-bit kryo architecture as the 820. It regresses in some areas, but offers massive improvements in others for something close to a 30% performance improvement over the 810 with A57 cores.

2016 -- AMD finally launches the A1100. ARM finally ships the A72 as their first design really tailored to the new 64-bit ISA.

Final Scores

Apple -- 2 years to ship new high-performance design

ARM -- 4 years to ship high-performance design, 5 years for new design

Qualcomm -- 4.5 to 5 years to ship new high-performance design

Sorry, something's definitely fishy. Nobody can design and ship that good of a processor in less than 2 years.

https://www.hardwarezone.com.my/tech-news-qualcomm-employee-...




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: