Linker Notes on AArch64

a2800276 · on March 8, 2023

How refreshingly nerdy and technical! If I wasn't currently stuck figuring out an arcane linker problem I wouldn't even have glanced at this, but by chance, it's a breath of fresh air. Thanks for posting!

exabrial · on March 8, 2023

I have a dumb compiler question that I should probably know and I’m embarrassed to ask.

AArch64 really highlights how CPU features differ greatly. There are instructions available on some CPUs that do some amazing things, but are simply not available in very similar models. To wn extent the same problem also exists in x86: for instance, there are a AES functions that are not available on all CPUs.

When using an AOT language, how does one take full advantage of all cpu features at runtime? This seems like a place JIT would really outpace AOT.

saidinesh5 · on March 8, 2023

Just compile multiple libraries, one per target CPU architecture and load the one you want to use (typically with dlopen/dlsym etc..):

Example from whisper.cpp: https://github.com/ggerganov/whisper.cpp/blob/master/example...

            if (loadVfpv4) {
                Log.d(LOG_TAG, "Loading libwhisper_vfpv4.so")
                System.loadLibrary("whisper_vfpv4")
            } else if (loadV8fp16) {
                Log.d(LOG_TAG, "Loading libwhisper_v8fp16_va.so")
                System.loadLibrary("whisper_v8fp16_va")
            } else {
                Log.d(LOG_TAG, "Loading libwhisper.so")
                System.loadLibrary("whisper")
            }

You can also just compile for x86 with all the features enabled, and then choose the functions you want to execute at runtime.

This way you don't need to deal with yet another JIT compiler..

Vogtinator · on March 8, 2023

The easiest option is runtime dispatch, for instance by using compiler support for function clones: https://lwn.net/Articles/691932/

kevincox · on March 8, 2023

Yes. To provide a quick summary, the compiler generates multiple versions of the code, then at runtime does feature detection and picks the best supported implementation. Generally this will only be done for particular functions or blocks that will get a notable benefit from these extra instructions to avoid too much code bloat (unlikely to significantly affect performance but will raise binary size).

dxf · on March 8, 2023

>When using an AOT language, how does one take full advantage of all cpu features at runtime? This seems like a place JIT would really outpace AOT.

One model is to not try and take advantage of the features at runtime, but to compile per-platform versions of your binary (with whatever flags and optimization levels you want, support for sanitizers, etc.) and deploy those appropriately. E.g. build one binary that supports "generic" x86 variants, one optimized for haswell, etc. This model works well if you have control over how your binaries get deployed (though not so well if you don't), and especially well if you can build the libc the same way at compile time also and statically link it into your binary. (This last point is one reason Google is investing in LLVM's libc.)

p_l · on March 8, 2023

GCC and ICC both support generation of function variants compiled with different flags - you can have a function compiled for generic, lowest common denominator target, plus variants that depend on various optional features, and it will be dynamically selected at runtime which one is appropriate (iirc it will be done during dynamic linking)

noloblo · on March 8, 2023

thanks for this @signa11 @maskray while i believe the world is moving to arm aarch64 and arm64, there is quite a bit of work still needs to be done in the llvm, compiler, linux software delivery ecosystem to bring aarch64 upto par with x86_64