Yeap. One of my roommates from college worked at HP during the Itanium developme...

kick · on Dec 27, 2019

The bigger picture is throwing everything away every few model years isn't "innovation," it's planned-obsolescence consumerism

x86* is a terrible architecture, though. The world would be a much better place without it.

Turing completeness and standardization > whiz-bang over-engineering

Turing completeness and standardization are fine, but there were and are better things to standardize on. "Something with a reasonable amount of thought going into it" is not "over-engineered."

Claiming "planned-obsolescence" would be reasonable if people weren't going through all of this work to maintain fifty years of (near) compatibility with an architecture originally used in a calculator.

eej71 · on Dec 27, 2019

You aren't wrong that x86 is terrible. But it's only terrible because it has survived so long and offered so much value through so many periods of change. I believe that any architecture that lives long enough will become "terrible".

It's not a bug. It's the scar tissue of hard won success. Long live x86!

gpderetta · on Dec 27, 2019

x86 is a terrible architecture. If we changed it overnight the world be an extremely marginally better place (if we ignore backward compatibility). In reality on large powerful machines the ISA matters very little. Where the ISA matter (low end mobile, embedded), x86 has never been a thing.

magduf · on Dec 27, 2019

x86 is a lousy architecture, but x86-64 isn't as bad; at least it has a good number of registers unlike x86.

It's easy to prove x86 is lousy by seeing its success in embedded and mobile devices: it hasn't had any, despite trying (Atom). ARM reigns supreme here.

x86 manages to hold on because of 1) backwards compatibility with proprietary software (namely Windows), and 2) inertia. We don't really notice it that much because we've covered over it with abstraction layers: almost no one writes assembly code any more.

We would be better off if we switched to something better, but we wouldn't notice it much; we might see a slight amount of power savings, and OS developers would be happier. But the benefits just aren't worth the costs. It's unfortunate, though: the major chipmakers should be able to just develop a nice, clean-sheet architecture which retains the good parts of x86 (like the PCIe bus and enumeration, things missing on ARM) and cleans up the bad things, and users shouldn't have to do anything other than make sure to select the Linux distro that matches that architecture. The presence of binary-only proprietary software that only works on x86(/64) is probably the biggest reason we're stuck with it: a competing CPU maker can't just come up with something new and have it "just work" (after getting their compiler changes merged into GCC and LLVM of course).

nwallin · on Dec 27, 2019

> x86 is a lousy architecture, but x86-64 isn't as bad; at least it has a good number of registers unlike x86.

Funny thing.

Whether you boot a modern x86 system in 32 bit mode or 64 bit mode doesn't change the number of registers you're using. You're still using the 32-128 physical registers on the core. That's why x86_64 code isn't particularly faster (sometimes slower) than the same code compiled for x86_32 mode.

People have this zany idea that assembly language is a low level language. It's not. When the CPU executes 32 bit x86 code it "compiles" it to uops that are totally unrecognizable to us and use dozens to hundreds of registers.

The thing about embedded kinda exposes your mental bias. When you scale up the x86 instruction decoder from Intel Atom scale to Xeon scale, the instruction decoder gets a little bit more complicated, but it's given a ton more tools to use. So sure, Atom sucks at embedded, but x86 is still king at desktop and beyond, and ARM will never be able to challenge it.

If ARM wanted to challenge x86 in terms of single threaded performance, it would need to do the same thing x86 does: have a super complicated instruction decoder that maps the 32 logical registers defined by the ISA to its 128 physical ones, reschedules everything, renames stuff where appropriate, identify loads that can be elided, etc. And all of the advantages of having a simple ISA go out the window, because the ISA is an illusion.

Unfortunately Intel's Architecture Code Analyzer is dead. LLVM MCA is almost as good. I recommend you play around with it a bit some time. CPUs kinda don't give a crap if you're speculating four loop iterations ahead and you're just using the same few registers over and over again for multiple purposes.

magduf · on Dec 27, 2019

>When the CPU executes 32 bit x86 code it "compiles" it to uops that are totally unrecognizable to us and use dozens to hundreds of registers.

Yes, but the x86 code itself can't address hundreds of registers, only a handful, because it assumes that's all there is (because that's all there was in the actual x86 processors way back). So the fact that you're using a complicated instruction decoder to get around this and make use of much more capable hardware underneath seems to me to be a big source of inefficiency: surely you would have more performance if your ISA could directly use the hardware resources, instead of needing a super complicated instruction decoder.

>So sure, Atom sucks at embedded, but x86 is still king at desktop and beyond, and ARM will never be able to challenge it.

ARM is already challenging it. They have ARM-64 servers now. Here's a place selling them, from a quick Google search: https://system76.com/servers/starling

>If ARM wanted to challenge x86 in terms of single threaded performance, it would need to do the same thing x86 does: have a super complicated instruction decoder that maps the 32 logical registers defined by the ISA to its 128 physical ones, reschedules everything, renames stuff where appropriate, identify loads that can be elided, etc.

Ok, then why not just make a new (or at least extended) ISA that makes direct use of all those things, instead of needing a super complicated instruction decoder? We already have lots of different ISAs for embedded CPUS: ARM has all kinds of variants (ARMv7, ARMv9, etc.), and MIPS does too. For best performance, you have to compile for the exact ISA you're targeting. We don't do this for desktop stuff mainly because Microsoft isn't going to make 30 different versions of Windows, but for embedded systems it's perfectly normal because everything is compiled from source.

nwallin · on Dec 27, 2019

> ARM is already challenging it. They have ARM-64 servers now. Here's a place selling them,

TBH that's not challenging x86, any more than Atom is challenging ARM in the embedded space. The fact that they're for sale doesn't mean they're good. Those server CPUs have terrible performance per core.

> Ok, then why not just make a new (or at least extended) ISA that makes direct use of all those things, instead of needing a super complicated instruction decoder?

Directly accessing all the parts which are hidden behind the ISA is called VLIW, and the performance is terrible every time someone tries to reinvent it. It sucked even when Intel released the Itanium, which ran Windows.

The problem is that many of the data dependencies are dependent on timings which aren't available at compile time. (Integer division, for instance, the latency is sensitive to the values of its operands. To say nothing of cache timing.) A super complicated instruction decoder knows what data it has and what it doesn't while it is making decisions about what uops to dispatch on the half dozen or so lanes it's managing. A sufficiently advanced compiler does not, so a VLIW has to wait for all data in the half dozen or so lanes to become available before it is allowed to dispatch the instruction. If you want to do "interesting" rescheduling/renaming, you need to bring back the super complicated instruction decoder. (AFAIK later Itaniums started down the path of a complicated instruction decoder, but the Itanium was canned long before its complexity started approaching contemporary x86 standards. It would have been interesting to watch that develop.)

I think you're fundamentally misunderstanding how much stuff the instruction decoder does. To be fair, I'm not doing it its full justice, (how can I? It won't fit.) but I think you're too quick to think all a CPU does is perform the assembly instructions which are fed to it. As the article states, modern computers aren't just fast PDP-11s.

als0 · on Dec 28, 2019

> ARM is already challenging it. They have ARM-64 servers now. Here's a place selling them, from a quick Google search: https://system76.com/servers/starling

From the site "Starling Pro ARM is currently unavailable.

We’ll loop you in when we have more to share. Want to be the first to learn when it arrives? Get exclusive access before anyone else."

Which is sadly the typical story with ARM servers - limited availability.

gpderetta · on Dec 27, 2019

Less architetural registers means more expensive stack spills and reloads. 8 (7 really and often 6) registers is too little. 16 is about right for the vast majority of programs.

In theory memory can be renamed as well, but this was not done untill the very last intel and amd architectures and, IIRC, it still has to be conclusively confirmed.

imtringued · on Dec 27, 2019

>It's easy to prove x86 is lousy by seeing its success in embedded and mobile devices: it hasn't had any, despite trying (Atom). ARM reigns supreme here.

By that logic everything is lousy. x86 is lousy because it couldn't replace ARM. ARM is lousy because it couldn't replace x86.

aswanson · on Dec 27, 2019

I chuckled at that calculator portion of the comment. It's hilarious,when you think about it.