Where N is the number of physical cores? I do not use hyper-threading (it tends to be bad for the floating point and bandwidth limited operations that I do), but usually find minimal compile times at N+1 jobs (but with little penalty for several more).
It depends on many factors what the optimal number of concurrent builds is, but the bottom line is that you want to maximize your CPU utilization and minimize context switching.
If you think that one extra concurrent job is enough to fill CPU utilization in the time that other jobs are blocking on iowait, then you are fine.
So, bottom line, factors to think about:
- your i/o throughput for writing the generated object files;
- the complexity of the code being compiled, - template-rich C++ code has a lot higher CPU usage versus i/o ratio
Sparse matrix kernels and finite element/volume integration. For bandwidth-limited operations, it is sometimes possible to get better performance by using less threads than physical cores because the bus is already saturated (for examples, see STREAM benchmarks). For dense kernels, I'm usually shooting for around 70 percent of peak flop/s, and any performance shortcomings are from required horizontal vector operations, data dependence, and multiply-add imbalance. These are not things that HT helps with.
Additionally, HT affects benchmark reproducibility which is already bad enough on multicore x86 with NUMA, virtual memory, and funky networks. (Compare to Blue Gene which is also multicore, but uses no TLB (virtual addresses are offset-mapped to physical addresses), has almost independent memory bandwidth per core, and a better network.)