Two reason for x86 (IMO) and 3 reasons for ARM. I've thought for years that the ...

StillBored · on April 23, 2020

Its hard to buy that the ISA really has anything to do with it. As you mention apple has a fairly narrow market target for their cores. Both intel and AMD are basically building server cores (can you say threading?) and selling them as client devices. Mostly because that is where the real money for them is. Apple OTOH is building a client/mobile core, and they benefit from a number of "features" that they enable, which are known performance problems in the desktop/etc space but continue for legacy reasons. Combined with Intel basically standing still for the last ~5 years, and the tables have reversed as far as who is ahead on process+microarch.

Basically a lot of apples advantages are:

1: Complete vertical control of compiler+OS+hardware 2: Plenty of margin to spend on extra die 3: More advanced process @ TMSC 4: Very narrow focus, apple has only a few models of iphone+ipad, where as intel has dozens of different dies they modify/sell into hundreds of product lines. So everything is a compromise.

Any of those four give them a pretty significant advantage, the fact that they benefit from all four cannot be discounted.

hajile · on April 23, 2020

aarch32 decode is far less complex than x86 and aarch64 is even less complex than that. On the power consumption side, decoders definitely make a difference. They use tons of power and a huge number of instructions means having a huge, power-hungry instruction decode cache.

In addition, complex instruction decoding requires more decode stages. This isn't a trivial cost. Intel can shave off several stages if they have a decode cache hit and that's not including the ones that are required regardless (even the simple Jaguar core by AMD has 7+ decode stages possible). Whenever you have a branch miss, you get penalized. Fewer necessary decode stages reduces that penalty.

StillBored · on April 23, 2020

OTOH, you have x86 using what is effectively a compressed instruction encoding, and a trace cache (although its advantageous enough arm designs are apparently using them now too) which reduces the size of the icache for a given hit rate. So the arch losses a bit here, and gains a bit it elsewhere. Its the same thing with regard to TSO, a more relaxed memory model buys you a bit in single threaded contexts, but frequently TSO allows you to completely avoid locks/fencing in threaded workloads which are far more expensive.

So people have been making these arguments for years, frequently with myopic views. These days what seems to be consuming power on x86 are fatter vector units, higher clock rates, and more IO/Memory lanes/channels/etc. Those are things that can't be waved away with alternative ISAs.

hajile · on April 24, 2020

If it were vectors, clocks, and memory, then Atom would have been a success, but even stripping out everything resulted in a chip (Medfield) that under-performed while using way too much power.

Either the engineers at Intel and AMD are bad at their job (not likely) or the ISA actually does matter.

StillBored · on April 24, 2020

Atom is a success, just not where you think it is. The latest ones are quite nice for their power profile and fit into a number of low end edge/embedded devices in the denverton product lines. Similarly the gemilake cores are not only in a lot of low end fairly decent products (pretty much all of Chuwi's product lines are N4100 https://www.chuwi.com/), but they are perfectly capable very low cost digital signage devices/etc.

So not as sexy as phones, but the power/perf profiles are very competitive with similar arm devices (A72). If you compare the power/perf profile of a denverton with a part like the solidrun MACCHIATObin the atom is way ahead.

Check out https://www.dfi.com/ for ideas where intel might be doing quite with those atom/etc devices.

foldr · on April 24, 2020

Conversely, if the instruction set was the main factor, you'd expect Qualcomm and Samsung also to have ARM processors with a similar power to performance advantage over Intel chips.

The reality is just that Apple is ahead in chip design at the moment.

hajile · on April 24, 2020

They are 2 years behind Apple and slowly catching up.

When Medfield came out, Apple didn't have it's own chip and x86 still lost. It was an entire 1.5 nodes smaller and only a bit faster than the A9 chips of the time (and only in single-core benches). The A15 released not too long after absolutely trounced it.

foldr · on April 24, 2020

>When Medfield came out, Apple didn't have it's own chip

>It was an entire 1.5 nodes smaller and only a bit faster than the A9 chips of the time

You seem to have the chronology all mixed up here. Medfield came out in 2012. The A9 came out in 2015. Apple was already designing its own chips in 2012. (The A4 came out in 2010.)

baybal2 · on April 23, 2020

> Apple shipped their large 64-bit design only a couple years after the ISA was introduced.

Actually ARM cores were available earlier than that, just nobody wanted to license them until the elephant in the room (Samsung) forced everybody to follow.

hajile · on April 24, 2020

Timeline

2011 -- ARM announces 64-bit ISA

2012 -- ARM announces they are working on A53 and A57 and AMD annouces they'll be shipping Opteron A1100 in 2014.

2013 -- The Apple A7 ships doubling performance over ARM's A15 design.

2013 -- Qualcomm employee leaks that Apple's timeline floored them and their roadmap was "nowhere close to Apple's" (Qualcomm seems to switch to A57 design around here in desperation -- probably why the 810 was so disliked and terrible).

2014 -- Apple ships the A8 improving performance 25%.

early 2015 -- Samsung and Qualcomm devices ship with A57. Anandtech accurately describes it saying "Architecturally, the Cortex A57 is much like a tweaked Cortex A15 with 64-bit support." Unsurprisingly, the performance is very similar to A15.

late 2015 -- Apple ships A9 with a 70% boost in CPU performance.

later 2015 -- Qualcomm ships the custo 64-bit kryo architecture as the 820. It regresses in some areas, but offers massive improvements in others for something close to a 30% performance improvement over the 810 with A57 cores.

2016 -- AMD finally launches the A1100. ARM finally ships the A72 as their first design really tailored to the new 64-bit ISA.

Final Scores

Apple -- 2 years to ship new high-performance design

ARM -- 4 years to ship high-performance design, 5 years for new design

Qualcomm -- 4.5 to 5 years to ship new high-performance design

Sorry, something's definitely fishy. Nobody can design and ship that good of a processor in less than 2 years.

https://www.hardwarezone.com.my/tech-news-qualcomm-employee-...