Zapcc – A faster C++ compiler

david-given · on Nov 26, 2016

This is based on clang --- it's not a new compiler.

Are the any working open source C++ compilers which aren't based on gcc or clang? I know of TenDRA, which appears to have ceased to exist and was always kind of incomprehensible and non-working and apparently it never got as far as STL support; there Path64, whose github page no longer contains the compiler repo; there's Open64, whose website no longer exists (although there does seem to be a daughter project, OpenUH, which did a release last year)...

Is there anything else?

DannyBee · on Nov 26, 2016

"Is there anything else?"

For C++? No. Too hard of a language to waste phd students maintaining :)

Over the years there have been a few more (openC++ is a good example, ROOT used to be one too), but they are all dead now AFAIK. One reason is that research in compilers is pretty much not in the frontends anymore. For any research still being done that uses C++ as a base, gcc/clang/llvm pretty much work fine. LLVM in particular has an IR that works for most researchers, is not hard to understand, etc. For people who want to try to build larger solutions, they usually start with C (see, for example, libfirm and friends).

Commercial folks use EDG or, actually, a lot are also starting to use clang now as a frontend.

(IBM was the other major company that had their own C++ parser for a longer time)

MereInterest · on Nov 27, 2016

Bit of a side-note, but I would disagree about ROOT/CINT ever being a C++ compiler. It was an interpreter for a language that looked somewhat related to C++, if you squint quite a bit.

* All variables were hoisted up to function scope. This meant that you couldn't use "int i" in one loop, and "unsigned int i" in the next. This also caused destructors to be incorrectly delayed until the end of the function call.

* Use of templates required pre-compiled dictionaries for each type the template might be instantiated for. Any templates occurring in interpreted code would be silently ignored.

* Incomplete standard library implementation. std::abs(long) is missing, for example.

* const is silently ignored.

Not entirely relevant to the current discussion, but good heavens, I am glad that monstrosity is gone.

wumpus · on Nov 26, 2016

Back when Path64 was highly funded, it was a nasty treadmill keeping up with g++'s ABI -- it changes every point release.

userbinator · on Nov 26, 2016

Cfront? Old and probably has various license restrictions, but you can get the source for it:

http://www.softwarepreservation.org/projects/c_plus_plus/ind...

I think the reason why there are so many C/C-like compilers, but very few C++ (I personally don't know of any open-source ones besides gcc and clang) is due to the complexity of the language; C is simple enough that a single person can easily write a compiler for one in a short time and have it compile both itself and large quantities of existing standard and not-so-standard code:

https://news.ycombinator.com/item?id=10731002

https://news.ycombinator.com/item?id=11903674

https://news.ycombinator.com/item?id=9125912

In contrast, C++ is at least an order of magnitude more complex, with far more features that need to be working to get to something which could usefully be considered 'C++'. For example, templates are pretty integral to the language and implementing them correctly is not trivial.

The number of people who can write a whole C compiler, while relatively many, is still a tiny fraction of all programmers. (I do wish there were more, since I personally think the concepts aren't so hard once you see the essence of the tiny compilers like tcc or C4.) And a much tinier fraction of them will want to try writing one for even a subset of C++.

jlarocco · on Nov 26, 2016

There was OpenWatcom, but it doesn't support anything past C++98, as far as I know.

There's a really high barrier to entry for C++ compilers, and between Clang, GCC, and all the proprietary compilers there's not a huge need for another one.

shmerl · on Nov 27, 2016

PathScale / Ekopath was supposed to be opened up years ago according to its owners, but it seems like it never happened.

https://en.wikipedia.org/wiki/PathScale

Sounds like it was more talk than real intention.

wumpus · on Nov 27, 2016

We complied 100% with the GPL, and, Path64 includes 100% of the code, including the parts that didn't have to be released under the GPL.

shmerl · on Nov 27, 2016

So where is the source published? I couldn't find any such information on the official site.

This doesn't have the compiler: https://github.com/path64

david-given · on Nov 27, 2016

Here's an interesting thread:

https://www.phoronix.com/forums/forum/software/general-linux...

The tl;dr from reading between the lines is that having the open source version was crippling their commercial sales, so they tried to make it go away.

There's a nightly download from http://www.pathscale.com/ekopath-compiler-suite makes you click through a GPLv2 license (the GPL is not an EULA, dammit!) but there's no source included or any link to source.

But I did eventually track down a clone of the original github version here:

https://github.com/somian/Path64

shmerl · on Nov 27, 2016

I see. So basically they wanted to open it, but changed their mind. How does current PathScale handle the GPL though? Or they never updated the compiler since then?

wumpus · on Nov 27, 2016

While we were shipping the compiler at the original PathScale, the source was distributed to everyone we distributed a copy to (GPLv2 clause 3a). I'm not sure why you'd mention github, given that they didn't exist yet. And I have no idea what you mean by "official site" when we're talking about a company that ceased business in 2006.

p.s. if you wonder why I'm so vehement about this topic, it's because you're basically accusing real people of being unethical based on your inability to find things now that we were required to give away in the past.

shmerl · on Nov 27, 2016

Ceased business in 2006? The claims about opening up the compiler were made in 2011.

http://www.pathscale.com/ekopath4-open-source-announcement

Either you are talking about something else completely, or I'm really confused.

wumpus · on Nov 27, 2016

I'm sorry that it's not obvious to you, but there are multiple corporate entities involved, several with the same name.

shmerl · on Nov 27, 2016

It's clearly not obvious. All I know, that current commercial PathScale compiler is not open source, despite previous claims.

Another such example is compiler made by Sun. They planned to open it up, but Oracle's takeover ruined those plans.

wumpus · on Nov 27, 2016

GPLv2 3a is "open source".

Intermernet · on Nov 27, 2016

While GPlv2 3a is open source, it usually refers to including the source with the product in the form of something like a CD-ROM.[1]

>3. You may copy and distribute the Program (or a work based on it, under Section 2) in object code or executable form under the terms of Sections 1 and 2 above provided that you also do one of the following:

>a) Accompany it with the complete corresponding machine-readable source code, which must be distributed under the terms of Sections 1 and 2 above on a medium customarily used for software interchange

Do you know if the source is downloadable, legally from anywhere? Or was it only distributed to original customers?

[1]: https://www.gnu.org/licenses/old-licenses/gpl-2.0.en.html

wumpus · on Nov 27, 2016

I have no idea if the source is downloadable, legally, from anywhere. That's not required by the license. I'm glad that you actually looked at the license before you asked!

shmerl · on Nov 27, 2016

Sorry to disappoint you, but CD-ROMs aren't customary used for software interchange anymore. Unless you are collecting antiques. So again, where can I find the code for the current day PathScale?

shmerl · on Nov 27, 2016

So, can you point to the code please? Since that compiler is commercially available now, they are publishing the code somewhere, right?

wumpus · on Nov 27, 2016

Which part of "GPLv2 clause 3a" did you not read?

static_noise · on Nov 26, 2016

> This is based on clang --- it's not a new compiler.

This is good. Apparently it improves compile speed by a factor of 2..10 for a wide use range without compromising on the output quality. For many C++ developers it will be like a nitro injection.

cmrdporcupine · on Nov 26, 2016

Not at all open source, but Intel's ICC is free.

wumpus · on Nov 26, 2016

Not open source, and actually it's not free, either. People sure think it is, but that's only for students at degree-granting institutions, and people working on open-source code who are not being paid.

Academic researchers can get a free library license, but not a compiler license.

int_19h · on Nov 26, 2016

Ditto Digital Mars C++ (nee Symantec C++, nee Zortech C++). But as I recall, like most other "alternative" C++ compilers, it's also stuck in the C++03 land.

WalterBright · on Nov 26, 2016

That's because I work on D now :-)

int_19h · on Nov 26, 2016

No worries. :)

DMC++ is still awesome, because it's the only ISO C++ compiler out there that can target any kind of DOS platform and binary format, from 16-bit .COM files (!!!) to 32-bit DOS extender.

PeCaN · on Nov 26, 2016

Which happens to have really great compile times, in spite of using even more metaprogramming than C++. Thank you for your work!

I was pretty skeptical of D for a while, mainly because of the GC, but I've started to take a liking to it. It's a very useful tool.

WalterBright · on Nov 27, 2016

I've been slowly converting my older C/C++ code still in use into D. Next up is the dmd back end!

wichtounet · on Nov 26, 2016

It's not free for commercial use. And it's probably the slowest compiler there is although the generated assembly is generally of incredible quality.

colejohnson66 · on Nov 26, 2016

It's slow because of the multiple times it has to compile and optimize for all the different processor revisions.

wumpus · on Nov 26, 2016

Does it? Last I looked it did 1 by default, 2 if you ask nicely.

The main reason it's slow is that it does a lot of analysis, even at -O1.

colejohnson66 · on Nov 27, 2016

It was my understanding that the advantage of Intel's compiler is that it would optimize for the strengths available on newer processors while still allowing it to work on older processors (for example, using AVX instructions if available and a slower branch if not) [aka, the "CPU dispatcher"]. Agner wrote about it "crippling" AMD processors because they didn't say they were "GenuineIntel" back in 2009.[0]

[0]: http://www.agner.org/optimize/blog/read.php?i=49

pjmlp · on Nov 26, 2016

Intel makes use of EDG as far as I am aware.

mhd · on Nov 26, 2016

> In conclusion, we can see that zapcc is always faster than both gcc and clang.

For testing his template-heavy library. A very small data set, and while templates are one of the problems when it comes to C++ compilation speed, it's certainly not the only one.

Let's see the differences when compiling Firefox or the whole KDE suite.

olegkikin · on Nov 26, 2016

On the main site they have all kinds of examples.

http://www.zapcc.com/benchmarks/

santaclaus · on Nov 26, 2016

Weird that the run-time benchmarks are on different codebases than the compilation benchmarks.

wichtounet · on Nov 26, 2016

You're right, I should emphasize on this on the conclusion ;)

DannyBee · on Nov 26, 2016

So, while neat, pretty much all of this would be solved by C++ modules and precompiled modules.

(and in fact, is, based on what i've seen. But i still hope these guys get to market and make some money before that takes over the world, because i know how hard it is to do what they are doing :P)

ndesaulniers · on Nov 26, 2016

Is google3 converted yet? ;)

DannyBee · on Nov 26, 2016

Not fully, but it's definitely well on the way :)

jdright · on Nov 26, 2016

I prefer external caching/distributed solutions like ccache/sccache/sndbs than these private forks of clang. Less risky and up-to-date.

rurban · on Nov 28, 2016

I tested my codebase with currently favors 'ccache gcc-6 -flto' over 'ccache clang-3.9 -flto' with zapcc. It's a pure C project.

* Build-time went from 58s to 37s with zapcc. I didn't use zapcc with ccache and shouldn't.

* Run-time went from 371s down to 204s. Almost double speed!

So it's clearly an effect of clang-4 over gcc-6, and not so much zapcc.

Then I crosschecked with clang-4. The run-time is entirely based on clang-4, confirmed. But the build-time with clang-4 was 44s, still 20% slower than with zapcc.

What I learned:

* ccache is horrible. zapcc is much better.

* clang-4 is fantastic, even if their lto is still too broken to be usable, regarding visibility and inlining.

I found great bugs with new clang-4 warnings.

I had no time yet to use it on C++, where the real advantages come up. On pure C there are just side-effects.

thechao · on Nov 26, 2016

One thing that always confused me about ccache was that it doesn't cache lib generation and executable generation. I know that the authors have insisted (until they were blue in the face) that supporting lib/exe caching would require rewriting ccache... I just don't understand why. Once you know about about -frandom=0, and you've removed all aspects of non-determinisim (__TIME__ & co.), then all that's left is the moral equivalent of `dwarfdump -u <exe>` for each compiler, and you're good-to-go for deterministic caching.

JoshTriplett · on Nov 26, 2016

The architecture of ccache maps preprocessed sources to object files; it uses the compiler to preprocess the source, and it hashes the result, knowing that nothing other than the compiler, command line, and preprocessed source determines the output.

Linking involves far more complexity, with the input files harder to determine. There's no equivalent of -E or -fdirectives-only for linking. A ccache for linking would have to identify and hash all library, object file, and linker script inputs, including those pulled in indirectly by linker scripts, in addition to the toolchain, the command line, and any linker plugins.

It's absolutely possible, and I'd love to see someone do so, but it seems significantly harder than caching compilation.

You'd also want to time the result, and figure out how long the reading and hashing takes compared to linking. ccache misses take only slightly longer than a normal compilation; link-cache misses may take much longer than normal.

On top of that, unlike a compilation cache that seems very likely to hit on the 99% of files not changed in a build, a linker cache would only hit when absolutely nothing has changed in the entire build. It might help for a project that links numerous tiny libraries or binaries (which seems relatively uncommon), but for a project that primarily builds a single library or binary, it'd only help if you rebuild entirely identical sources twice.

(It might, however, speed up Linux kernel builds if you've only changed the code for a couple of modules and not anything in the core kernel.)

enqk · on Nov 26, 2016

there's also FASTbuild (http://fastbuild.org) which supports windows

jeremiep · on Nov 26, 2016

I would love to see a comparison of the performance of compiled programs.

If zappcc creates slower executables but spitting them out faster, it could be used during development to speed up iterations. And if the executables are faster, I'm very curious as to how they achieved both faster compilation and faster runtimes.

userbinator · on Nov 26, 2016

I think a good example of this is tcc, one of the fastest C compilers I've seen --- because it doesn't do much optimisation at all and is single-pass, it can generate code as it parses, but the output is dismally inefficient.

Another example of ultrafast compilation is Delphi, but once again the generated code looks more like a dumb line-by-line translation with plenty of redundant and unnecessary instructions (making decompiling interesting in that it easily produces something quite close to the original source.)

wumpus · on Nov 27, 2016

You might want to compare tcc to just about any compiler with -O0 -- they're a lot faster if they are allowed to generate slow code. It's also super-straightforward to find compiler bugs, if you're lucky enough that it's an O0 bug!

zbowling · on Nov 26, 2016

In theory the compiled output should be the same as clang (if there are no bugs) being that this is just an optimized clang where compile structures are cached.

olegkikin · on Nov 26, 2016

Looks like the speed isn't compromised. Scroll all the way down on this page:

http://www.zapcc.com/benchmarks/

wichtounet · on Nov 26, 2016

Normally, it should make almost difference compared to the code generated by the same version of clang on which zapcc is based. It may make a difference in the long term if they don't keep up to date with clang trunk.

rurban · on Nov 28, 2016

zapcc uses clang-4.

clang-4/zapcc is massively faster than gcc-6 for my usecases, and significantly faster than clang-3.9. perl is 2x faster than with gcc-6 -O3.

the_duke · on Nov 26, 2016

Mhm. I'm confused.

The value of a caching compiler should really become apparent in incremental builds, as in rebuilding after changing a single file. Yet the author talks about "not seeing any improvements".

Like he said, he might be doing something wrong.

The speedup observed anyway might come from the compilers having to rebuild/instantiate templated code everytime it's included.

DannyBee · on Nov 26, 2016

"The value of a caching compiler should really become apparent in incremental builds, as in rebuilding after changing a single file. Yet the author talks about "not seeing any improvements"."

There are millions of reasons this may not be true in C++. For starters, the use of time and date macros, etc.

Without precise dependency tracking of what source lines are dependent on what macros (which is super hard, and i don't think they do), which is not usually what is done (dependency tracking is often much more coarse grained), you may not see an improvement.

VisualAge C++ was one of the best incremental C++ implementations i ever saw, and even it did not get to this level.

pjmlp · on Nov 26, 2016

> VisualAge C++ was one of the best incremental C++ implementations i ever saw, and even it did not get to this level.

How did it compare with Energize C++?

I only know both from magazines during those days, even though someone uploaded an Energize video to YouTube.

In regards to VisualAge C++ I think only those of us that were active back then can remeber anything about it. Besides the magazines I had with the product review, I never seen much information being posted on the Internet.

DannyBee · on Nov 26, 2016

I never used Energize C++.

I remember looking at the code to visualage C++ when i was at ibm about 12 years ago.

It was fairly impressive (it built a database of the program with fairly fine grained dependency tracking), at least to younger me.

I don't know if people today would have thought it was a mess or not :)

hokkos · on Nov 26, 2016

From here :

https://www.zapcc.com/faq/

How will you license Zapcc?

Zapcc will be available under a commercial license from Ceemple Software Ltd.

santaclaus · on Nov 26, 2016

Hell, even if this only gives a speedup with template heavy code I'm on board. We use a number of header only, template meta programmed to death libraries (RapidJSON, Eigen, ViennaCL), and compilation speed improvements would be a huge productivity boost.

mrich · on Nov 26, 2016

I would love to see a C++ compiler implement multi-core optimization and code generation for template instantiations. I feel this can be a big win for cases where you instantiate a template n times and they are basically all independent from each other.

DannyBee · on Nov 26, 2016

clang/llvm are working on it. For codegen anyway.

It's not clear that it will help with template instantiation, because it's fairly hard to parallelize.

Certainly, possible, but very hard to do "optimally" (IE by sharing work instead of duplicating it). Given any initial work is likely to have to do duplicate work per thread, this usually cuts into your speedup quite a lot.

Codegen, on the other hand, is pretty much fully parallelizable. This is the whole reason thinlto exists.

halayli · on Nov 26, 2016

Why not use clang's PCH (precompiled headers) feature?

http://clang.llvm.org/docs/UsersManual.html#precompiled-head...

wichtounet · on Nov 26, 2016

Unfortunately, it's quite inconvenient to use precompiled, there are a lot of limitations in each compiler. For instance, only the first header include a source file can be precompiled and other things like that, this is a more general approach. But PCH can bring a really good speedup too and are free ;)

ape4 · on Nov 26, 2016

This was my question too. Zapcc's faq says: Precompiled headers requires building your project to the exact precompiled headers rules. Most projects do not bother with using precompiled headers. Zapcc works within your existing build.

Precompiled headers are currently ignored by Zapcc.

midnightclubbed · on Nov 26, 2016

Curious as to how much faster this makes standard workflows where you are re-compiling only a few files and the linker is typically the bottleneck. I use incredibuild in my day-job it's a great help when doing full re-compiles or changing a pervasive header but offers no help on smaller builds where it doesn't parallelize the link. Zappcc doesn't look to do anything with the linker either.

mrich · on Nov 26, 2016

You can try gold with parallel and/or incremental linking (although incremental linking didn't work on a large library where I would've needed it most)

trymas · on Nov 26, 2016

non open source, based on clang, but faster?

in the perfect world I would like that everyone who uses clang would freely benefit from this improvement. should some big and good corporations just buy out those guys?

wumpus · on Nov 26, 2016

The entire point of using a BSD/MIT-style license is that it gives "freedom" to a different set of people than the GPL does. In this case, why is it not perfect for someone to invest money building a commercial product on top of clang? Isn't it just fine that they want to make a lot of money?

pjmlp · on Nov 27, 2016

That is the GPL world, the BSD world is all almost making this kind of work possible.

So don't worship MIT/BSD and complain later about it when companies prevent you to get the code.

dman · on Nov 26, 2016

Applied for the beta program day before yesterday, hope I get in.

faragon · on Nov 26, 2016

TL;DR: not a new compiler, but one based on clang plus some tuning/optimizations.

winter_blue · on Nov 26, 2016

If Clang had used a GPL-like license, Zapcc would have been forced to share all their modifications to Clang with the whole world, and we would've all benefited from it -- and maybe the optimizations would even have been merged back into the mainline of Clang.

But as it stands now, this is a closed source product, that you to buy: https://www.zapcc.com/buy-zapcc/

haberman · on Nov 26, 2016

Why do you think the Zapcc developers would have worked for free on this? It seems pretty clear that they developed the software because they thought they could make some money off it.

And thanks to the fact that they did this, we now know it is possible. Competition will hopefully motivate the Clang developers to develop similar performance improvements in the mainline of Clang. Everyone will benefit.

A Clang user is certainly no worse off than they were yesterday.

bonzini · on Nov 26, 2016

Hmm, what about contracting? Exactly what Cygnus did with GCC.

hayd · on Nov 27, 2016

or even GNUPro? \s

ndesaulniers · on Nov 26, 2016

or it may never have been built in the first place

Eridrus · on Nov 26, 2016

Riiiight, because these folks would certainly have spent the time to make a product that people won't pay them for.

pjmlp · on Nov 26, 2016

As I mentioned in several other threads, in around 10 years time, the GPL will be sorely missed.

The only thing left will be the Linux kernel and a few major projects like Emacs.

The circle will be complete and we will be back to the 90's with freeware like licenses.

qwertyuiop924 · on Nov 26, 2016

I'm not sure I agree. Maybe for some software, but I wouldn't even think about using a programming language without at least one quality open-source implementation, and I think many developers would agree.

pjmlp · on Nov 26, 2016

Most developers use whatever the company IT department and managers decide they should use.

Not everyone has the luxury to switch jobs all the time to use the programming languages they want to use.

qwertyuiop924 · on Nov 27, 2016

Fair 'nuff. But let me ask you this: will those managers ask them to use a language without a high-quality open-source implementation? And if you look at the most popular languages out there (and even many of the fringe ones) the answer is probably not.

Unless they're writing APL.

pjmlp · on Nov 27, 2016

I would say if you look at the companies around the world, from all possible sizes, not the SV bubble, the answer is yes.

What they care is who support their tools, who they are going to call, how SLAs are enforced.

qwertyuiop924 · on Nov 27, 2016

Erm... what languages are you talking about?

I mean, I've genuinely unsure.

pjmlp · on Nov 27, 2016

All the stored procedure programming languages of commercial SQL servers, .NET before Microsoft opened it up, commercial compilers of Common Lisp, C++ Builder, Delphi, Ada, C and C++ compilers for embedded development (no clang and gcc aren't the only ones), Coldfusion, Flash, Objective-C (gcc and clang are just a tiny part of the whole stack), Cobol, RPG, NEWP, a few in-house proprietary languages, Java compilers for embedded platforms with extended AOT features

qwertyuiop924 · on Nov 27, 2016

Let's see.

-SQL Stored Procedure Languages, .NET (used to be), Delphi, Coldfusion, Objective-C, RPG, NEWP, and AOT Java are valid examples, IMHO.

-C++ Builder kind of qualifies, as the UI language is unique. So I guess we could count most UI designers as this.

-Ada, (most) embedded C/C++ compilers, and COBOL don't count, because open-source implementations do exist.

pjmlp · on Nov 28, 2016

It doesn't matter if there are open source implementations of language X, if you cannot use them in processor X, operating system Y, rather the closed source commercial compiler of the processor X, operating system Y vendor.

Quite common in embedded space.

qwertyuiop924 · on Nov 28, 2016

...I said most architectures. Many do have an open source implementation of C.

pjmlp · on Nov 29, 2016

Which you may not be allowed to use within the company.