Hahaha, fuck me, CPUs are fast. That's wicked. 15 mins. A billion lines of C. Insane. Wonder if there's some IO speed to be gained from ramdisking the inputs.
It is pointed out that the threadripper does worse per core when under full load than even high core count consumer CPUs like the 3950x/5950x. That's the tradeoff you make for huge core count CPUs. 4x 3950x might do better, but then you need to build 3 other PCs, and for actual processing tasks, co-ordinate stuff to run across multiple systems.
Lots of single-pass compilers can achieve 1MLOC/s. But the main problem is that C++ has an O(n^2) compilation model due to header explosion. Also, expanding C++ templates is very computationally intensive.
TCC is fast enough that you can recompile the kernel every time you boot up.
I remember playing with TCC-boot back in 2015, and on a relatively beefy machine at the time I could compile the kernel in 5 seconds (or about 37mb/s iirc).
I can build the D compiler (500k lines?) Warts and all in a second in my machine - and that's code that's not particularly complicated, but not at all optimized for compile times realistically.
To be fair, due to the fact that DMD can't depend on phobos it doesn't use a lot of the features that end up making D compiles slow, like lambdas and deep metaprogramming. DMD is in a way one of the least representative examples of a large D project.
V8 is a JIT compiler, which has a baseline interpreter and progressive optimization. It's not surprising that interpreters have fast startup time. (Hotspot is the primary exception in this case, but a big part of its slowdown comes from the fact that it verifies bytecode before loading it.)
Bytecode verification is not a large part of compilation time for Java. For HotSpot server, verification is dwarfed by building the IR, repeatedly walking and transforming it, generating code, and register allocation. Keep in mind that HotSpot will very likely inline a ton of methods, making the compilation unit large. Many passes are at least slightly superlinear, so that makes compilation time explode.
That said, I ported C1 (the client compiler) to Java back in my Maxine VM days, about 2009. At that time on typical x86 processors it was easily hitting 1.8MB/s (of bytecode) compilation time. You can work backward to how many lines of code a second that is (and sure, you aren't paying the cost of the frontend), but yeah, that is in the ballpark for 1MLOC/s I think. And that's with an optimizing compiler with IR and regalloc!
I wonder if it’d be possible / useful to implement a rust or C compiler that worked that way. Basically instant startup time after a code change is a delightful feature.
I've been bandying around ideas lately for an extremely fast c compiler that, among other things, caches machine code for individual function declarations. You wouldn't realistically be able to get to the level of speed as js, though, because js is a dynamic language. With a statically typed language like c or rust, any time you change a declaration, you have to re-typecheck everything that depends on that declaration. (Though you can frequently avoid generating new code, if you implement structs using hashtable semantics.) Both languages also have a concept of compile-time evaluation, and rust in particular has to deal with template expansion (which is not trivial, I hear they use an SMT solver for it now).
Strictly speaking, Rust doesn't have templates, or "template expansion", but there is a thing called "trait resolution" which does take effort to solve, that is related but not the same thing as what C++ calls "template expansion."
That sounds like a fun idea! I've been toying with the idea for years of having a fully incremental compiler - basically caching the dependency DAG of code compilation. "This function, depends on the definition of this struct, which in turn depend on these typedefs and this other struct..."
Then I can imagine a compiler which takes the DAG from the previous compilation and information about which lines of code have changed, and can figure out which intermediate compilation results need to be recomputed. And the result would be a set of blocks which change - globals and symbol table entries and function definitions. And then you could implement a linker in a sort of similar way - consuming the delta + information about the previous compilation run to figure out where and how to patch the executable from the previous compilation run.
The end result would be a compiler which should only need to do work proportional to how much machine code is effected by a change. So if you're working on chromium and you change one function in a cpp file, the incremental compilation should be instant. (Since it would only need to recompile that function, patch the binary with the change and update some jmp addresses. It would still be slow if you changed a struct thats used all over the place - but it wouldn't need to recompile every line of code that transitively includes that header, like we do right now.