I recently implemented a runtime for `__builtin_cpu_init()`, `__builtin_cpu_supports()`, and `__builtin_cpu_is()` for x86-64. Using these compiler intrinsics, or a higher level feature such as `[[gnu::cpu_dispatch]]`, you can write functions that behave differently on different CPUs. Fortunately the implementation isn't terribly complex. On x86, it's based around a neat `cpuid` instruction, and other ISAs have similar features.
https://github.com/Cons-Cat/libCat/blob/main/src%2Flibraries...