an alternative library [cpuinfo](https://github.com/Maratyszcza/cpuinfo) is a similar offering but also additionally exposes more information such as cache sizes, topology information (number of sockets, etc.) and information on attached integrated GPU.
FYI I'm the author of the library. I designed it with the help of the person who wrote the code for the Android NDK. We agreed that the end goal is to replace Android NDK specific code with this library - the NDK will expose the exact same API but using this library under the hood.
> features are retrieved by using the cpuid instruction. *Unfortunately this instruction is privileged for some architectures, in which case we fall back to Linux.
What's the rationale for making this instruction privileged for the other procs? I can kinda understand making system registers privileged reads but if you have your own encoding there's "no reason" not to give this to everyone, right?
The complex CPUID instruction is, to my knowledge, unique to x86 processors. The equivalents for other processors are essentially special machine status registers that set bits appropriately.
The main reason that these are privileged is probably because:
a) The main use of these is checking if you've got floating-point and/or vector unit support. This support requires the OS to know about this information anyways (imagine if your OS didn't bother to save/restore these registers on context switches!), so making the OS check this information is not difficult.
b) There's generally already a fairly generic mechanism for stuffing OS-level special-purpose registers that's fairly unbounded in size, so it's essentially free to add an addressing of these specific registers in that mode. Making some of these registers be legal in user-space code is more expensive in hardware size and complexity, and (as mentioned earlier) the OS has to be modified anyways to take advantage, so requiring the OS to report these details to userspace code isn't problematic.
c) For RISC processors, you've generally got a fairly full opcode space anyways. Taking up an opcode for a very niche functionality that isn't particularly necessary isn't a good use of scarce extra space.
1. Malware can infer details from CPUID to guess if it’s in a VM or not. Useful to avoid detection/analysis.
2. When processes execute in a legacy mode you may trap on CPUID to hide details the library won’t understand, or doesn’t expect to exist. To avoid backwards compatibility issues... better safe then sorry.
3. When developing SIMD related libraries which need to be portable across multiple CPU versions you may set up a CPUID mask (so trap, then hide features) to ensure compatibility on legacy computers.
Overall having the ability to trap, and rewrite the CPUID instruction is incredibly useful. The difference between denying, and rewriting just boils down to if a callback is provided or not. Both features require disabling native CPUID execution.
CPUID on x86 is a user-level instruction. When you're doing processor-assisted emulation, CPUID is an instruction that does cause an exit to the hypervisor, which does allow you to do CPUID emulation. You clearly don't need such instructions to be privileged to do emulation games with CPUID.
> 3. When developing SIMD related libraries which need to be portable across multiple CPU versions you may set up a CPUID mask (so trap, then hide features) to ensure compatibility on legacy computers.
Huh? My impression was CPUID was exactly the opposite: using it properly allows ensuring compatibility on older computers while still getting maximum performance, since one can dynamically dispatch to an implementation that uses only what is supported. (E.g. switch between an SSE2 version and an AVX version.)
I think GP is saying that unit tests for such dynamically-dispatching code will want to be able to inject older CPUIDs, so the compatibility fallbacks have test coverage. If CPUID is a native instruction, it's harder to inject a fake value.
I'm not sure I find this compelling though. It's easy to make your own CPUID function that doesn't actually call "CPUID" in a test build.
Intel has really made a mess out of CPUID. It's supposed to be an easy way to query CPU features, and for some things it is, but they keep changing the way it works especially for newer features line AVX2. It's like they cannot even follow their own APIs.
Wonder if dmidecode could be updated to include more cpu details, the cache features and other flags are limited.
Version: Intel(R) Core(TM) i7-7820HQ CPU @ 2.90GHz
Voltage: 0.8 V
External Clock: 100 MHz
Max Speed: 2900 MHz
Current Speed: 2300 MHz
Status: Populated, Enabled
Upgrade: Other
L1 Cache Handle: 0x004C
L2 Cache Handle: 0x004D
L3 Cache Handle: 0x004E
Serial Number: To Be Filled By O.E.M.
Asset Tag: To Be Filled By O.E.M.
Part Number: To Be Filled By O.E.M.
Core Count: 4
Core Enabled: 4
Thread Count: 8
The CPU you compile on will probably not be the CPU your program runs on. If you know what CPU you're running on I guess you can just make assumptions about what features that chip supports, but it's much safer to use something like this to verify at runtime that you support avx or whatever features you need.
I was getting at cross-compilation, where you do know the CPU you're running on, but I see that this breaks down because the binaries can run on CPUs that support different features.
Even for same architecture compilation, the host's feature set is rarely that of the targets. If you're distributing binaries, you generally want to target a safe, old minimum baseline to maximize user bases--for x86-64, resorting to just sse2 is safe [1] and often sufficient [2]. Most compile farms and developer machines are often newer hardware.
[1] The x86-64 ABI actually requires SSE2 extensions to work correctly.
[2] SSE2 added double-precision and vectorized integer support to the SSE registers, the former allowing you to replace x87 FPU usage for floating point (unless you need long double, which is extremely rare). The newer SSE sets generally add only specialized operations that are unlikely to show up in autovectorization anyways, and the wider instruction set of AVX is less useful for performance in the "it might be useful" autovectorization scenario. They are useful for specific known hot regions of code, and in distributing binaries, variants of these are constructed for different levels of feature sets and are dynamically selected based on the user's actual hardware.
To add on the other comments, not only the machine where you compile is not the same as the one where you run your software, but consider the case where you want a software to run on as many CPUs as possible while still making use of advanced features where supported.
You'd tell the compiler "target a machine with <basic instruction set>", so it's not allowed to use advanced features like FMA because it can't assume they're supported. With this library, you'd check at run-time if FMA is supported, and then change functions accordingly.
Yes, I realized that I didn't take that into account. A follow up though: could a executable loader find and "trim" implementations that aren't optimize for the processor it's using, if it's properly marked in the binary?
> Just in case anyone is tempted, please do not write code that assumes the host on which it is compiled will always be the host on which it will be executed.
Scroll down to get to the table that includes Physical Processor, Intel AVX, Intel AVX2, etc.
T2 instances do run on a number of different processor models, therefore it is listed as "Intel Xeon Family." M4 instances run on Intel Xeon E5-2676 v3 (Haswell) or Intel Xeon E5-2686 v4 (Broadwell), though m4.10xlarge only runs on Haswell and m4.16xlarge only runs on Broadwell. Pay close attention to the * in the table for those instances that run on either Haswell or Broadwell.
Generally you will find that the recent generations of EC2 instances for C and R have identical CPUs within a generation.
It's supported:
* on x86 since GCC 4.8
* on PowerPC since GCC 6 (you will also need glibc ≥ 2.23)
Documentation:
https://gcc.gnu.org/onlinedocs/gcc/x86-Built-in-Functions.ht...
https://gcc.gnu.org/onlinedocs/gcc/PowerPC-Built-in-Function...