GCC Automatic Parallel Compilation Viability Results Help Up to 3.3x

xxpor · on Sept 4, 2020

I regularly work with multi-thousand line source files, but compilation speed has never really been an issue there, at least for me. The issue has always been the link time afterwards (especially if you turn LTO on). I guess its probably more of an issue for C++.

Is anyone working on adding parallel linking to GCC like clang has?

kevincox · on Sept 4, 2020

I agree. In all of the large projects I have worked on the build systems were constantly optimized under the observation is that compilation is infinitely parallelizable and the latency doesn't matter nearly as much as throughput but the link is the expensive step on the critical path.

ComputerGuru · on Sept 4, 2020

Linking improvements would have far-reaching benefits. I have similar observations to your comment regarding both C++ and rust.

rrss · on Sept 4, 2020

I'm not familiar with clang's support for parallel linking, so maybe this isn't what you mean, but GCC supports parallel ltrans for LTO, and IIRC GCC 10 made multithreaded ltrans the default behavior.

xxpor · on Sept 5, 2020

Yeah, I could just be behind. I try to at least know what GCC is up to, but for one of the platforms I work with we're still stuck on GCC 4.9 :(

saagarjha · on Sept 4, 2020

For C++ it really does become an issue, since each individual file can take over a second to compile (usually depending on how heavy the template usage is), even for moderately-sized files (~1000 lines).

matt42 · on Sept 4, 2020

Projects using templates and heavy metaprogramming often see most of the compile time spent in template instantiation, not in link time. For this kind of projects I think this would help.

cryptonector · on Sept 4, 2020

Try building the SQLite3 amalgamation.

MaxBarraclough · on Sept 4, 2020

Very impressive work for a student, the GCC codebase isn't known for being approachable.

Can we expect Clang to implement the same trick in a few months time?

jcranmer · on Sept 4, 2020

No.

LLVM is not amenable to using multiple threads to compile a single TU. The use-lists of global values (such as functions, global variables, or constant expressions) include all uses from all functions, so parallelizing on a per-function basis requires acquiring locks (or some sort of lock-free data structure) to add, remove, or iterate these lists, which would considerable overhead on a relatively common operation.

If you want more details, you can read the recent thread on llvm-dev discussing this: http://lists.llvm.org/pipermail/llvm-dev/2020-March/139606.h...

yxhuvud · on Sept 4, 2020

Lets certainly hope clang will eventually manage to improve things in that context, because the restriction against parallelization during the optimization step when there is a single compilation unit is quite punishing.

fluffything · on Sept 4, 2020

Doesn't clang already implement this ?

LLVM can process multiple codegen units in parallel already, and many frontends split TUs into multiple codegen units.

matt42 · on Sept 4, 2020

Great that C++ compiler teams are working on reducing C++ compile time. The next great feature I think would be a game changer is caching the result of compilation (templates and binary code). A fine grained cache acting at the level of a line (or a function if it's too hard) would save an incredible amount of time to C++ devs, while also simplifying build systems. This would also stop engineers from losing days trying to speedup their build.

jcelerier · on Sept 4, 2020

Such a compiler exists, it's called zapcc (was a fork of clang 3.something). Sadly it has been abandoned and never merged even though it was opensourced.

_bz2r · on Sept 6, 2020

The github page states:

> When was the source last merged with LLVM trunk?

> This open-source release was last merged with LLVM 325000 on 2018-02-13.

pjmlp · on Sept 4, 2020

We already had function level compilers back in the early days, Energize C++ and VA for C++ v4, but they were too resource hungry for what companies were willing to pay for and they died.

There is an Energize C++ demo floating around on YouTube.

Twirrim · on Sept 4, 2020

does ccache or sccache help?

I rarely touch C / C++, but when I do I've been having to git bisect stuff. Having ccache in between has been invaluable in reducing the run time.

gnufx · on Sept 4, 2020

This was being touted for the Fedora build system recently, and Jeff Law made typically sensible remarks to temper the enthusiasm. I don't have a link to the devel archive to hand.

rowanG077 · on Sept 4, 2020

What is automatic here? It's clear from the article what this means.

MaxBarraclough · on Sept 4, 2020

I suppose it's more automatic than manually splitting up your source to facilitate parallel compilation, but I agree, it's not very clear.

Perhaps it would have been better to describe it as intra-process parallel compilation in GCC, or something along those lines.

acqq · on Sept 4, 2020

Imagine you have one source file with 100 of functions which you want to compile. Traditionally it could be compiled only using one core, processing function after function. Assuming we are processing already preprocessed information, there's no theoretical reason why more cores couldn't be used to compile the functions: e.g. with 4 cores, each could get 25 functions. The practical reason why not is not only that it increases code complexity in the trivial case (all the functions take the same time) but that in non-trivial case you can end up trying to do a lot of what the OS and make system together do with separate processes doing the separate compilation of many files, dynamically deciding what to do at which point.

This experiment tries to make a case that something in that direction is doable and worth doing. I missed the good proof for the later, based on the analysis of state of the existing projects, however. I would personally rather consider the completely opposite direction:

In the big C++ projects, most of the functions are already present in small compilation units, and the total of the build process mostly spends time in processing the same huge set of headers for every compilation unit. Often, the whole process would be faster if more compilation units would be compiled as one(!) when the headers aren't used in a way that the semantic of them changes depending on the way in which they are used (I've heard that, of all, LLVM actually does that unfortunate thing, but I haven't spend time analyzing that myself).

thebiss · on Sept 4, 2020

Compiling entire units as one was standard practice dealing with the (defunct) IBM VisualAge C++ compiler. On one of our larger projects, updating the makefile to concatenate all .cpp's in each component, then compiling those .cpp's, reduced build times from 15-20 hours down to 1-2 hours.

jcelerier · on Sept 4, 2020

With CMake it's just a matter of setting CMAKE_UNITY_BUILD=1

saagarjha · on Sept 4, 2020

Well, I would assume you'd also have to make sure your code is safe to build under that kind of inclusion.

yxhuvud · on Sept 4, 2020

One complication is inlining, whre some of those 25 methods may want to inline methods handled by some other process.

lawwantsin17 · on Sept 4, 2020

Google: Let's start all of our projects on a foundation built by an intern.

saagarjha · on Sept 4, 2020

Google Summer of Code participants are not interns, they are paid a stipend to work on open source projects. And–isn't it good if an intern project ends up being used?