LLVM Clang 12 Merges Support For x86_64 Microarchitecture Levels

jeffbee · on Nov 8, 2020

This email explains much better what is happening here.

https://sourceware.org/pipermail/libc-alpha/2020-July/116135...

dang · on Nov 8, 2020

Ok, we'll change to that from https://www.phoronix.com/scan.php?page=news_item&px=LLVM-Cla... above. Thanks!

alblue · on Nov 8, 2020

I found the phoronix summary better than the sourceware email. The email proposed a feature and had “it could look like …” bikedshedding discussions back in July — the original phoronix page summaried the conclusion of adding the feature, including what the names are now called.

It’s useful to know where something started, but the original reference was better at explaining the current situation.

dang · on Nov 8, 2020

Ok, I'll change it back. I don't know which article is better but the phoronix.com one is quite a bit more recent. Everyone who's curious should probably just read both.

CalChris · on Nov 8, 2020

However, the email has nothing to do with the post title. The email is talking about Linux distributions and doesn't even use the word clang or even llvm. It may have precipitated the change but it is not the change. Moreover, the Phoronix article cites the email, well cites their own article about Linux distros which cites the email.

jeffbee · on Nov 8, 2020

There is usually a much better primary source behind any post of Phoronix.

est31 · on Nov 8, 2020

What about Atom CPUs?

I have a netbook bought in 2015 that has an Atom processor. Right now it runs Ubuntu 20.04 LTS. Does this mean that distros will drop support for it?

IMO instead of dropping support, LLVM should get support for fat multi-versioned binaries, providing a graceful fallback if an extension is not available. You don't have to multi-version everything, only the functions which actually do end up getting autovectorized.

sgerenser · on Nov 8, 2020

Most Atom CPUs would support either the first or second target in this list.

zeotroph · on Nov 8, 2020

Using "-march=x86-64-v3" is still a decision which has to be made at compile time. I wonder if the Apple appstore (at least watchOS/tvOS) approach of using bitcode as the one-time artifact and then turning that into concrete instructions later would be possible for an entire Linux distribution (Gentoo II). The Mill CPU team wants to use something like that OS-wide to hide/emulate entire missing functionality on lower-tier CPU models.

Or the other possibility of compiling for all architecture and then selecting this or that function depending on what the current CPU is, or even hotpatching if the security model allows it. I remember either gcc or glibc having some support for one of these.

apocalypses · on Nov 8, 2020

One thing LLVM bitcode still can’t do is retain information about preprocessor directives, eg any platform specific code for AVX2 vs SSE4 etc. So unless you aim to write intrinsic free code it’s usually less performance/reliable to rely on compiler automatic vectorisation, which results in worse codegen overall.

jcranmer · on Nov 8, 2020

> One thing LLVM bitcode still can’t do is retain information about preprocessor directives, eg any platform specific code for AVX2 vs SSE4 etc.

LLVM supports per-function subtarget attributes, so you can compile individual functions with AVX2 support versus SSE4 support. The clang frontend even has a few different ways of triggering this support, with one method allowing you to specify per-CPU dispatch, and another merely specifying target attributes on a per-function basis.

wtallis · on Nov 8, 2020

I know GCC supports generating multiple versions of a function, compiled for different instruction set extensions. And this can also be done manually when you have hand-optimized implementations: https://gcc.gnu.org/onlinedocs/gcc-10.2.0/gcc/Common-Functio... It's based on a GNU extension to the ELF format, not preprocessor directives.

I don't think any of that would conflict with using bitcode for everything you don't have a hand-optimized machine-specific implementation of.

firethief · on Nov 8, 2020

> I wonder if the Apple appstore (at least watchOS/tvOS) approach of using bitcode as the one-time artifact and then turning that into concrete instructions later would be possible for an entire Linux distribution (Gentoo II).

I could see this happening with WASM once its system interfaces and SIMD intrinsics mature... "write once, run anywhere"?

ip26 · on Nov 8, 2020

Even if you could just pick a few target march...

I mean, how many real world compute farms have exactly one march anyway?

dragontamer · on Nov 8, 2020

Its a dumb name change, but this is a dumb name change that should have happened a while ago.

While ARM has ARMv6, v7, v8... Intel has been using specific names: Nehelem, Haswell, and Skylake (not to be confused with Skylake-X).

A numerical naming scheme, of v1 (Core2), v2 (Nehelem), v3 (Haswell), and v4 (Skylake-X) will be easier to remember moving forward.

The_rationalist · on Nov 8, 2020

It's not a dumb name change, targeting skylale could generate illegal instructions on modern AMD cpus e.g they lacking TSX or SGX. This is about defining the common intersection of both vendors by generation

jcranmer · on Nov 8, 2020

> they lacking TSX or SGX.

Note that the SGX instructions are system only, so the compiler isn't generating them usually. The hardware lock elision instructions in TSX are specifically chosen so that they act as NOPs on processors that don't support TSX.

my123 · on Nov 8, 2020

Hardware lock elision was disabled via a microcode update. (see https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/lin...)

throwaway2048 · on Nov 9, 2020

Not to mention TSX instructions have been broken in one way or another since their introduction, and now have been removed entirely in new intel silicon.

my123 · on Nov 8, 2020

And assuming those are present for a given gen is even dumber sadly.

There are Comet Lake CPUs with no AVX at all: all Pentiums/Celerons.

Every 10th generation Intel CPU onwards, at least on the client side, doesn't support TSX.

kevincox · on Nov 9, 2020

I think it would have been even easier to understand if they used the launch year of the architecture. Of course you may need a finer number (include the month or just a count if there are multiple launches in a year?) but I think it is worth it to get a rough idea of how old something is without finding a lookup table.

dragontamer · on Nov 9, 2020

The details of chip releases is more complicated than just a numerical year.

FMA4 extensions on Bulldozer are no longer supported, so a 2012-era CPU supports FMA4, but 2018-era Zen processors do NOT support FMA4.

Xeon Phi supported a whole slew of AVX512 stuff that no other chip will ever support. Intel experimented with some instructions, and decided against them moving forward.

The point of a "vague" number, like "v3" is just to provide a general idea. Its purposefully vague, because the details are simply too messy to keep track of.

kevincox · on Nov 9, 2020

I never suggested that year should be precise. I just think it is useful to use year as a vague number instead of something completely arbitrary. I think the only real downside is if people assume too much from that number.

> FMA4 extensions on Bulldozer are no longer supported, so a 2012-era CPU supports FMA4, but 2018-era Zen processors do NOT support FMA4.

I don't see how this is relevant. Isn't this the same problem that v4 supports something that v9 doesn't?

CalChris · on Nov 8, 2020

Is this an X86_64 backend command line option that is being documented in Clang? Are there any Clang specific changes?