Hacker News new | past | comments | ask | show | jobs | submit login
Make LLVM Fast Again (nikic.github.io)
582 points by notriddle on May 10, 2020 | hide | past | favorite | 235 comments



Am I the only one who wants to see a split into a "fast compile" mode and a "spend hours making every optimization possible" mode?

Most code is executed a lot more frequently than it is compiled, so if I can get a 1% speed increase with a 100x compile slowdown, I'll take it.

I don't want to see good PR's that improve LLVM delayed simply because they cause a speed regression.


You can already spend as much as you'd like on optimizations if you are using LLVM. Just use a superoptimizer [0]:

    clang -Xclang -load -Xclang libsouperPass.so -mllvm -z3-path=/usr/bin/z3
, or increase the inlining heuristics..., or just create your own optimization strategy like rustc does [1], or...

LLVM is super configurable, so you can make it do whatever you want.

Clang defaults are tuned for the optimizations that give you the most bang for the time you put in, while still being able to compile a Web browser like Chrome or Firefox, or a whole Linux distribution packages, in a reasonable amount of time.

If you don't care about how long compile-times take, then you are not a "target" clang user, but you can just pass clang extra arguments like those mentioned above, or even fork it to add your own -Oeternity option that takes in your project and tries to compile it for a millenia on a super computer for that little extra 0.00001% reduction in code size, at best.

Because often, code compiled with -O3 is slower than code compiled with -O2. Because "more optimizations" do not necessarily mean "faster code" like you seem tobe suggesting.

[0]: https://github.com/google/souper [1]: https://github.com/rust-lang/rust/blob/master/src/rustllvm/P...


> Because often, code compiled with -O3 is slower than code compiled with -O2. Because "more optimizations" do not necessarily mean "faster code" like you seem tobe suggesting.

Indeed, due to patterns of resource use and availability (e.g. the memory wall[0]), compiler optimisations can only solve 1-10% of performance problems a piece of code might have. To paraphrase Mike Acton: The vast majority of your code consists of things the compiler can't reason about.[1]

[0]: https://en.m.wikipedia.org/wiki/Random-access_memory#Memory_...

[1]: [Mike Acton, "How to Write Code the Compiler Can Actually Optimize", GDC2015.](https://m.youtube.com/watch?feature=youtu.be&v=x61H6qEtK08&t...)


Has anyone ever done a Pareto frontier analysis on this? Specifically, this would be for speed rather than size.


What would you expect to learn and how would you do this?

I can't imagine anything useful coming out of it.


Yeah this seems like two very different use cases. Stating the obvious: when debugging I want as fast builds as possible. When shipping I want as fast software as possible.


The only problem being if you want to debug the optimized code ;)


This is true a lot in games - often the "debug" version of a game is literally too slow to reproduce issues, so you have to work with the optimized builds.

...even if the actual debugging would be much easier with a debug build.


Understood. However even then I would hope that games have unit tests and other module tests that can run just a subset of code and not the entire game. When I'm writing code I want to know it builds and the basics work before I run the entire program.

I work on seasonal programs. I can only do fully realistic tests in the month of April (depending on the weather it shifts a few weeks) , meaning my window to test is closed for the next 11 months. I have learned to find ways to run my code in less realistic situations.


There are also those bugs that only manifest in the release build.


That happens, but fortunately not often. Most problems show up in either optimized or unoptimized code. If it is in optimized code only it is a compiler bug (rare but I have seen it) or a race condition that won't show up in the debugger either. (there are a few more choices but the above are the big ones in my experience)


In optimized code, it is far more likely that one has a programming error in one's source code that an aggressive optimization revealed, rather than a bug in an industrial compiler. For example, an uninitialized variable that ends up sharing a memory location with some other variable whose life-time doesn't overlap in optimized mode rather than having its own memory location without optimization.

That one is caught be a warning, but there are thousands of possible errors that can be exposed this way.

Compilers have bugs all the time, but code in a write-test-debug-cycle typically has many more.


Which is why equally aggressive debug checks are important


This should be an optimisation that you choose to enable, though, right? It might be prohibitively expensive to run the compile for other reasons (time to do a deploy for a fix, time and cost to run builds for CI etc.) so it has to be something that can be dialled up/down.


-O3 should include those expensive optimizations. If you want fast builds with minimal optimizations, go with -O1


Does clang have -O0?


Yes (clang11 documentation)

  -O0, -O1, -O2, -O3, -Ofast, -Os, -Oz, -Og, -O, -O4
      Specify which optimization level to use:
          -O0 Means “no optimization”: this level compiles the fastest and generates the most debuggable code.
          -O1 Somewhere between -O0 and -O2.
          -O2 Moderate level of optimization which enables most optimizations.
          -O3 Like -O2, except that it enables optimizations that take longer to perform or that may generate larger code (in an attempt to make the program run faster).
          -Ofast Enables all the optimizations from -O3 along with other aggressive optimizations that may violate strict compliance with language standards.
          -Os Like -O2 with extra optimizations to reduce code size.
          -Oz Like -Os (and thus -O2), but reduces code size further.
          -Og Like -O1. In future versions, this option might disable different optimizations in order to improve debuggability.
          -O Equivalent to -O2.
          -O4 and higher
              Currently equivalent to -O3


For fast compiles, in clang, -fsyntax-only parses as quick as possible. Then you have -O{0,1,2,3s} for various levels of optimization.

As for the long baking compile, you have a concept of super compilation which will check the branches and determine if there are any places where a function is called with constants and partially evaluate (similar to partial application / prameter binding, but you get a newly compiled function) the function with those constants frozen. But then it has to determine if the branch elision makes it worth dropping the other versions of the function. It's a cool topic that I researched a lot about a decade ago but I think it's not an area with a lot of active interest in AOT compilers.

https://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.50....

http://wiki.c2.com/?SuperCompiler

https://en.wikipedia.org/wiki/Stalin_(Scheme_implementation)


Guile scheme is getting a "baseline- compiler for that reason: it will compile fast, but not do any of the fancy CPS things that the regular compiler does.


It all depends on which optimizations you enable and LLVM is very flexible, albeit sometimes you still spend 20% of your time in ISel (Instruction Selection)...


> Most code is executed a lot more frequently than it is compiled, so if I can get a 1% speed increase with a 100x compile slowdown, I'll take it.

Is that really true? I'd have thought most code outside inner loops benefits almost negligibly from optimization.


That really depends on your program. Many servers have sprawling, flat profiles. Most workloads are not computational fluid dynamics. A perspective on hot/cold code ratio is found in "AsmDB: Understanding and Mitigating Front-End Stalls in Warehouse-Scale Computers" https://storage.googleapis.com/pub-tools-public-publication-...


Good question. It's worth asking whether optimizations significantly help "most" code.

There are some optimizations that make a big difference everywhere:

- consolidating pure function calls. This can save arbitrary amounts of time. Classic example is strlen being called many times. I know you're not asking about inner loops, but this can change the complexity of loops which is a big big win.

- mem2reg. Kind of self explanatory. Registers are fast but we write C and C++ using addressable objects. Most compilers make a decent effort here even with optimizations turned off.

- global variable numbering. This allows loads and stores to be removed/moved. Often prevents cache misses or puts them off until the end of a function when it doesn't block execution.

- strength reduction. Turning your divisions into shifts. Turning your && into &. Etc. It is not unusual for these peephole changes to save 10s of cycles per instruction replaced.

These are also really fast optimizations (mem2reg can be extremely slow if "optimal", but the heuristic versions that everyone uses are quick).

If you know you won't care, you can mark the function as cold. That said, the compiler might ignore you and decide you cannot possibly really want to disable something like mem2reg.


I don't have time to figure out the correct tradeoff for optimization for each tiny code section of code. Just let the compiler do everything and it might be too much but it won't be too little.

Besides, little things add up even outside of tight loops. Slow code in rarely used areas is still noticeable when that rare thing happens.


So, you don't have time to profile to see where the hotspots are, but you care about how fast your program is? Mkay.


It may benefit negligibly from optimization -- or rather, the overall program may benefit negligibly from optimizing that code site -- but it's still gotta be executed a lot more frequently than it is compiled right? Unless it has only one or two users, who both compile it all the time, or something.


How many people would.use a cloud compiler?

Let's set aside thecnicalities and assume it a real X5 improvements and all the files are mirrored seamlessly.


If you are referring to a caching distributed compile cluster, most companies with large codebases are using one (or should). It helps a lot and can make the difference between taking half a day for a scratch make (i.e. unusable) to getting down to 10 min.

There are open-source ones but I'm also aware of at least two internal, custom-developed ones.




Ken Thompson would be rolling in his grave if not for him being alive.

I could see this having some use in an internal manner for things like game development where you have absolutely enormous codebases (as an abstraction above compile servers and similar).


Many already use one called distcc.


My company would pay for that, up to $50/developer/month. Its a no brainier if you really get X5 speed improvements, not just for full builds but incremental as well. Of course there are tricky issues there, privacy, lock-in, how intrusive the scheme is, but the build tax in C++ has a very unappreciated impact on productivity, one that the C++ committee seems incapable of addressing.


I think there is no cloud service for this precisely because companies mostly don't want to give out their source code. Although many already rent AWS instances for their devs :)

In many companies the build cluster runs on the developers' workstations themselves, which has the benefit of fully using idling machines. The drawback is higher maintenance due to less reliability of such machines.

Would your company accept internal hosting for such a cluster, i.e. paying for the hardware themselves?


In my particular case no, it would be too much hassle for a small business like ours (we are a remote only firm with only a few people). But I can imagine it would make sense when we grow.


There is no cloud service for that ... except Google Cloud Build.


Are you aware of anyone using it for C++? I couldn't find any mention of C++ in the documentation nor via Googling (sure it will be possible but it seems to involve significant setup)


I'm not really plugged in to the industry. I've used it with bazel cc_binary targets, so I know it works.


Besides the cluster compilers used by C++.

OpenJ9 for Java can use a cloud compiler,

https://blog.openj9.org/2019/05/01/openj9-internship-making-...

.NET when AOT compiled for Windows Store uses cloud compilers, https://blogs.windows.com/windowsdeveloper/2015/08/20/net-na...

And as of Android 10, the cloud compiler is everyone's phone, because PGO data gets uploaded to the play store and improved on each device feedback.

https://android-developers.googleblog.com/2019/04/improving-...


icecream (icecc) is basically this. It’s “cloud” in that it’s distributed, but it requires surprisingly high bandwidth to the cluster, so it basically requires an office LAN connection. Or for your editor to be on that connection a la VSCode’s remote thing.


This is probably the other big reason why "cloud compilers" are rare. Not much point in using one if you can compile the source locally more quickly than you can upload it.


MBED already pioneered the cloud compiler.

Their compiler did optimizations to produce (slightly) better code than any compiler you could buy.

A lot of users were uneasy with it.


It's only at link time that object will depend on another, so you don't need a new compiler to support a "cloud" use case where any number of machines can work on compilation.


A lot of large C++ projects use link-time optimization, since it's the easiest way to get predictably good performance.


This work is critical to compile times improving.

As the author of one of the changes which could have unknowingly causing a 1% regression, I really appreciate this work measuring and monitoring compile times. Thanks to nikic for noticing the regression and finding a solution to avoid it.


I really hope this type of infrastructure gets moved into LLVM itself, and people start adding more benchmarks for all the frontends, and somehow integrating this into the CI infrastructure, to be able to block merging PRs on changes that accidentally impact LLVM's performance, like is currently the case for rustc.

But I guess the LLVM project should probably start by making code-reviews mandatory, gating PRs on passing tests so that master doesn't get broken all the time, etc. I really hate it when I update my LLVM locally from git master, and it won't even build because somebody pushed to master without even testing that their changes compile...

For Rust, I hope Cranelift really takes off someday, and we can start to completely ditch LLVM and make it opt-in, only for those cases in which you are willing to trade-off huge compile-times for that last 1% run-time reduction.


> to be able to block merging PRs on changes that accidentally impact LLVM's performance, like is currently the case for rustc.

rustc's CI doesn't prevent merging PRs that impact performance: while the reviewer can request benchmarks beforehand and choose not to approve the PR if it introduces a regression, all other benchmarks are run after the commits are merged to master.


> somehow integrating this into the CI infrastructure

It is incredible how much extra hardware can make development easier. I find it so funny that our main instinct to speeding up a projects development is to throw people at projects (thus the "Mythical Man Month" book), when in reality you should be throwing hardware at it, and probably extra testing.

> For Rust, I hope Cranelift really takes off someday, and we can start to completely ditch LLVM and make it opt-in, only for those cases in which you are willing to trade-off huge compile-times for that last 1% run-time reduction.

I mean, that'd be nice, but I definitely don't see that happening anytime in the next 5 years. The current plan is for it to be for debug only, and due to the IR rust emits, the gap between debug and release can be huge.


Unfortunately sometimes losing performance is the correct tradeoff. However it needs to be carefully considered before you just make it.


Accepting a performance loss is almost always the right trade-off. If it weren't, everybody would be writting their code in assembly at 5LOC/day.


Good read. One thought-provoking bit for me was:

> Waymarking was previously employed to avoid explicitly storing the user (or “parent”) corresponding to a use. Instead, the position of the user was encoded in the alignment bits of the use-list pointers (across multiple pointers). This was a space-time tradeoff and reportedly resulted in major memory usage reduction when it was originally introduced. Nowadays, the memory usage saving appears to be much smaller, resulting in the removal of this mechanism. (The cynic in me thinks that the impact is lower now, because everything else uses much more memory.)

Any seasoned programmers would remember a few of such things - you undo a decision made years ago because the assumptions have changed.

Programmers often make these kinds of trade-off choices based on the current state (typical machines the program runs, and typical inputs the program deals with, and the current version of everything else in the program). But all of those environmental factors change over time, which can make the input to the trade-off quite different. Yet, it's difficult to revisit all those decisions systematically as they require too much human analysis. If we can encode those trade-offs in the code itself in a form that's accessible to programmatic API, one can imagine implementing a machine learning system that can make those trade-off decisions automatically over time as everything else changes via traversing the search space of all those parameters. The programming language of today doesn't allow encoding such a high-level semantic unfortunately, but maybe it's possible to start small - e.g. which of the associative data structure to use can be chosen relatively easily, the initial size of datastructure can also be potentially chosen automatically based on some benchmarks or even metric from the real world metric, etc.


I don't think that exploding the state space of your program by making the history of your design decisions programmatically accessible (and changing them regularly to reflect new assumptions) would be good for the quality of the result.


I don't think it's as simple as saying "the state space explodes, and that's bad".

When you say state space, I think about what is dynamically changing. If you can select one of two design decisions e.g. at compile time then, yes, your state space is bigger, but you don't have to reason about the whole state space jointly. The decision isn't changing at run time.


You have to have tests for all combinations though. At least those combinations that you actually want to use. You get the same problem when your code is a big ifdef-hell.


Testing is important, for sure, but just because you have two parameters with n choices each, does not mean you have to test n^2 combinations. You can aim to express parameterization at a higher level than ifdefs.

For example, template parameters in C++. The STL defines map<K, V>. You don't have to test ever possible type of key and value.


I'm pretty sure that you need n^2 tests if you have n non-equivalent choices each. For maps many types are equivalent so you don't need an infinite number of tests.


If the two hypothetical parameters only affect disparate program logic for some or all of their possible choices, they could require as few as 2n tests instead of the full n^2... If I'm understanding the hypothetical right. (It depends on their potential for interaction.)


Setting aside the AI angle, perhaps just recording the assumptions in a way that can be measured would be enough.

Tooling, runtime sampling, or just code review could reveal when the assumptions go awry.


Does FFTW automatically generate implementations optimized for each machine?


> I’m not sure whether this has been true in the past

Phoronix.com has a lot of Clang benchmarks over the years.

I recall seeing some benchmark that showed that as Clang approached GCC in performance of compiled output, the compile speed also went down to approach GCC levels.

But I haven't managed to find that exact benchmark yet.


An expected result when they copy GCC's features, and functionality. Isn't it?

Pretty much the sole point of Clang/LLVM to the corporate sponsors is to get the GCC, but without GPL


All good sibling comments. Also Clang is Chris Lattner's baby from his days back at University of Illinois. It was never intended to be "GCC without the GPL". It demonstrates an entirely different compilation paradigm where you optimize abstractly based on clang's IR. If you can compile to IR (a clang front end) you can apply the same optimizations universally. Obviously it has it's practical hiccups like still needing to have an architecture specific optimization pass but it's a cool idea and has proven rather successful.


In defence of the OP, notice that they did not talk about the original intent of clang, which is as you mention. They talk about the main reason of its corporate sponsors, which is an entirely different thing. It seems plausible to me that these sponsors may have different motivations than the creator of the project.


That idea is nothing new, it has been used in IBM's RISC project, Amsterdam Compiler Kit, Microsoft's Phoenix compiler toolkit, IBM i, IBM z/OS, Unisys ClearPath and plenty others.

And getting back to GCC, I had to study GIMPLE during my compiler assignments.

What LLVM has going for it versus GCC, is the license, specially beloved by embedded vendors and companies like Sony, Nintendo, SN Systems, CodePlay can save some bucks in compiler development.


Chris Lattner wrote a chapter about LLVM in The Architecture of Open Source Applications.

https://aosabook.org/en/llvm.html

Based on that, my understanding was that while intermediate representations were certainly not new, being strict about not mixing the layers was still quite rare. He specifically claims that GCC's GIMPLE is (was?) not a fully self-contained representation.

I'm not an expert in any of this. Just sharing the link.


My experience with the Amsterdam Compile Kit (ACK) is that while ACK successfully managed to separate frontends, optimizers and backends using a single intermediate language, at the end it was the intermediate language that held it back.

The intermediate language was strongly stack oriented, to the extend that local CPU registers were mapped to stack locations. This worked well for pdp-11, vax, mc68k, and to some extent x86.

But when Sun's SPARC got popular it became clear that mapping stack register windows was not going to result in good performance.

One option would have been define a new register oriented intermediate language, just like llvm has now. But by that time research interests at the VU Amsterdam had shifted and this was never done.


Interesting historical anecdote: ACK was briefly considered as a base for implementing GCC, but was rejected for licensing reasons.


Here is a paper about PL.8 research compiler used in IBM RISC research in mid-70's.

http://rsim.cs.illinois.edu/arch/qual_papers/compilers/ausla...

Key points, using a memory safe systems programming language which apparently would be too slow for the target hardware, thanks to the compiler IL representation and multiple execution phases (sounds similar?) achieves the goal of being usable to write OS in 70's hardware.


> What LLVM has going for it versus GCC, is the license, specially beloved by embedded vendors and companies like Sony, Nintendo, SN Systems, CodePlay can save some bucks in compiler development.

The license is probably considered an advantage by many companies. However it is definitely not the only reason for LLVMs success. There are many technical reasons as well, e.g. cleaner code and architecture. My personal impression is that a lot of research and teaching has moved from GCC to LLVM as well, universities usually do not care that much about the license.

Yes, GCC has GIMPLE (and before that just RTL) but it is not as self-contained as LLVM's IR. In GCC front-end and middle-end are quite tangled on purpose for political reasons. Nevertheless I agree that LLVM isn't as revolutionary as the poster you are replying to is claiming, reusing an IR for multiple languages was done before. However I don't think any other system was as successful as LLVM at this. E.g. Rust, Swift, C/C++ via clang, Julia, Fortran, JITs like JSC/Azul JVM are/were using LLVM as a compilation tier, GPU drivers, etc. Those are all hugely successful projects and if you ask me this is an impressive list already while not even complete. It seems most new languages these days use LLVM under the hood (with Go being the exception). IMHO this is also because LLVM's design was flexible enough that it enabled all those widely different use cases. GCC supports multiple languages as well, but it never took off to the degree that LLVM did.

I don't know all the compilers you mentioned but how many of those were still maintained and available on the systems people cared about by the time LLVM got popular? Are those proper open-sorce projects?


Yes that is an impressive list, but I bet if LLVM had gone with a license similar to GCC, and everything else remained equal, its adoption wouldn't be as it is.

No those projects aren't open source at all, they used their own compilers, or forked variants from GCC which you couldn't reveal thanks NDAs, now thanks to clang's license they have replaced their implementations, only contributing back what they feel relevant to open source.


Is that really an "entirely different compilation paradigm?"

Most (all?) GCC frontends compile to a common IR. The main difference really is that GCC doesn't market that as an interface for interacting with the compiler. In LLVM, the IR is the product, in GCC its the individual language compilers.


> Clang is Chris Lattner's baby from his days back at University of Illinois.

No, that's LLVM. Clang was Steve Naroff's baby after he stepped down from managing the team.


And for those who think there's some malicious (as opposed to friendly) competition between the two projects: Steve Naroff was also instrumental (both technically and managerially) to get Steve Jobs to pay us to get Objective C++ working in gcc back in the NeXT days.


Is Objective-C++ really that old? For some reason I had the idea that it was from this side of the millennium…


Yes, Objective-C++ was part of NeXTSTEP SDK.

https://www.nextop.de/NeXTstep_3.3_Developer_Documentation/R...


I think that’s a typo. Objective-C, as you may know, is about as old as C++ (Wikipedia says it’s from 1984 and C++ from 1985, but both were evolved over several years, so I wouldn’t categorically say Objective-C is older)

I think Objective-C++, basically an implementation of Objective-C that makes it possible to link with C++, must be from 2000 or later, but even a semi-exact date isn’t easy to find (probably in Clang’s release notes)


Objective-C++ was part of the NeXTSTEP SDK, so pretty much 20th century tech.

https://www.nextop.de/NeXTstep_3.3_Developer_Documentation/R...


Thanks for the correction. I thought Objective-C++ was something Apple concocted early 21st century.


Yeah nowadays it is very hard to find anything about Objective-C++, even Apple has taken the documentation down, so only old timers still have some references.

Even the link I provided, who knows for how long it will still stay up.


Given that I worked on it, and it shipped, in the mid 90s It certainly is not “must be from 2000 or so.”

Plus in this case what has Clang to do with it given that we are talking about gcc?


> an entirely different compilation paradigm where you optimize abstractly based on clang's IR

Doesn't most of GCC's optimisation happen at the level of an internal IR?


Steve Naroff spoke about the motivation for clang at the (first, I think) LLVM meet-up, which I also attended: http://llvm.org/devmtg/2007-05/

Slides:

http://llvm.org/devmtg/2007-05/09-Naroff-CFE.pdf

Video:

https://www.youtube.com/watch?v=Xx7zFn31PLQ&feature=youtu.be

So there was a real technical need. IIRC, there was also a personal need, as Steve wanted to do something else, and I am sure not being dependent on gcc was also a big deal.

From long experience, Apple doesn't like to be dependent on others for crucial bits of their tech stack. Relations with the gcc team weren't the best, even without the GPL issues, although the new GPL v3 was also seen as a problem. I think Apple switched before having to adopt v3.


You seem to be implying having a competitor to GCC is a bad thing? While there is something to be said for duplicated effort, I think it's actually helped improve GCC a lot because they also 'copied' good things Clang did (like much better warnings, built-in static analysis etc.). So really everybody has benefited.

It's good to have a bit of diversity and options, especially when they're all trying to be compatible.


Having two good compliant compilers is great for C++ too: it keeps both compilers honest and also makes spotting weird behaviour easier in big codebases (i.e. to first order if your code results in different behaviour in GCC or Clang then it could be dangerous).

Also has the added benefit of reducing the need to use compilers like Intel's (not sure on current benchmarks) but I really wouldn't want to ship something for AMD CPUs with Intel's compiler.


Contrary to what gets discussed here, C++ has more than just two good compliant compilers.

https://en.cppreference.com/w/cpp/compiler_support and embedded compilers, or more legacy like platforms aren't listed.


Besides GCC, Clang, and MSVC++, most other C++ compilers license the EDG frontend, so there's less heterogeneity than you would expect. I don't actually know if there's another implementation of C++11 besides the ones I listed. Even Microsoft now uses the EDG frontend for Intellisense, rather than their own (more incorrect) frontend.


It is still more than a gcc vs clang thing.


As in "Free" (One open source and GPL) compliant compilers.


Well that was missing.


> So really everybody has benefited.

> It's good to have a bit of diversity and options

It can be consistent to hold that those two statements do not hold in this case. GCC requires that developers uphold certain user freedoms (i.e. compiler extensions must be free software). Clang allows user's freedoms to be more easily violated.

It's fine if you want to hold those two opinions, but if you state them as if they're the only opinions, that's not great. If you don't acknowledge that they're predicated on beliefs such as "user freedom isn't important to the expense of software improving in other ways" (or such), then of course you'll have trouble understanding why one might view clang/llvm in the negative light of effectively enabling "GCC, but without GPL".


I've heard this said before. Why would someone want this? AFAIK the GPL isn't really relevant unless you're modifying and redistributing GCC itself. Even if you use modified GCC internally for compiling for-profit software you just need to allow the employees who use GCC to see your modified code, which doesn't seem like a big deal since you already trust them with your application code.


One argument I've heard before is that GCC's architecture is intentionally designed to make it difficult to extend GCC or integrate it into other tools such as IDEs, because that sort of modularlization could enable interoperability with non-GPL software. I'm not sure whether this argument has any merit, but see e.g. this email from ESR: https://gcc.gnu.org/legacy-ml/gcc/2014-01/msg00209.html

Edit: that email from ESR cites a talk from an LLVM developer which goes into more detail about this argument and about a lot of the architectural differences between GCC and LLVM: https://www.youtube.com/watch?v=lqN15lrADlE. I'm not sure how up-to-date this is though, as it looks like it was recorded in 2012.

At least in the Apple ecosystem, the results of LLVM certainly spoke for themselves. Immediately after hiring Chris Lattner and adopting LLVM, Apple's developer experience began to improve massively in a very short time, coming out with automatic reference counting, Objective-C 2.0, the Clang static analyzer, live syntax checking and vastly improved code completion, bitcode, Metal shaders, and Swift within just a few years. Of course, I don't know how much of this was due to technical reasons rather than legal reasons (but Apple did make most of this work open-source).


While RMS is opposed to such a thing there is no architectural reason for it not to happen, just little to no support for anything not GPLed. ESR has no idea what he is talking about. It is true that GCC's architecture dates to the 80s and so isn't modular in the same way LLVM is, but that is hardly due to some sort of policy-embodied-in-code.

As well, RMS no longer controls gcc, and has not since the late 90s when I engineered a fork of gcc from the FSF and into the hands of an independent steering committee. At the time such an idea was radical...thankfully it is now commonplace.


Thank you for the clarifications, this is really quite interesting. Compiler development is fascinating, and I love these sorts of threads because I always learn something new about the technology/culture/history.


Because Apple modifies and distributes LLVM in their proprietary developer tools.


> AFAIK the GPL isn't really relevant unless you're modifying and redistributing GCC itself.

A lot of interesting LLVM use-cases are all about that, adding custom frontends or backends used in software that is distributed. Some random examples:

https://www.elementscompiler.com/elements/island/

https://ispc.github.io/

https://www.khronos.org/spir/


Just like GCC has had Cobol, Ada, Fortran, Modula-2, Pascal, Java, Go and plenty of other ones throughout the years.


And how many of those were/are commercial or similar? The point is that GCC requires you to adhere to the GPL, which may not be desirable.

The Island platform that I linked to is one such commercial example, made and sold by RemObjects[1]. OpenCL support in graphics drivers is another example.

[1]: https://www.remobjects.com/


GCC's C, C++ and Fortran frontends are commercial products of at least Red Hat and SuSE, for example see https://developers.redhat.com/products/developertoolset/over... and https://suse.com/c/suse-linux-essentials-where-are-the-compi...

GCC's Go frontend is still commercially supported as part of RHEL7 but was replaced by the more popular Go implementation in later versions.

GCC's Java frontend used to be commercially supported in the days before OpenJDK.


GNATPro is commercial, which is GNAT plus some extra stuff and commercial support from AdaCore.

Modula-3 frontend was commercial, by Elego Software Solutions.

Then naturally Objective-C and Objective-C++ frontends used on NeXTSTEP and OpenSTEP.


It is somewhat easy to work and modify llvm tools: an optimization pass for instance. Also, LLVM tools were designed to be interchangeable. One can pick specific parts of LLVM and integrate into the workflow of others.


Distributuon is a common use case. Not as much for a compiler, but for other things. My company spends a lot of time (money) ensuring we can ship the right source code when asked even though paying us $5 for it is silly when you can get it from the internet for free.

I'd say nobody has ever asked, but that isn't true. One person in the test group actually read the entire eula and sent us $5 to get the source code. (I found out when his letter was returned to sender - we officially went through the entire release process just to fix the address at great expense. Even though it was still internal not released legal demanded we show good faith in correcting the problem) testers like that are worth far more than anyone pays them.


You don't technically have to disclose internal mods to all employees because everyone is part of a single legal entity and no binary distribution occurs.


Perhaps but then you will kick yourself the first time you want to give a customer or business partner access to your modifications. Unforeseen business needs or opportunities can always make you need to distribute a tool you previously thought would only be internal. I'm not saying this is a slam-dunk reason to not use GPL tools, but I'm just pointing out that there is a legitimate worry here that is not fully alleviated by your point.


Because Apple is intensely allergic to the GPL.

The reasons are:

- As somebody else mentioned, Apple redistributes developer tools, clang being the poster child

- Since they releases OS products, they don't want to co-mingle their software with GPL code. (So they use an older bash on Mac OS X.)

- fear of an Apple developer quietly copying GPL source into a commercial product (well-founded, actually)

- Apple Legal exerting an "abundance of caution" on IP

- at this point, it's institutional. When I worked there, Linux and MySQL were forbidden, for example, but that has relaxed recently.

Also, I think you misunderstand the GPL. If you distribute modified gcc, anybody receiving it can ask for sources. So employees plus end-users.

(One of the strangest examples is that Yamaha uses real-time linux in their synths, and you can download the GPL portions. I can't imagine a musician ever wanting to do that!)

Source: ex-Apple.


Specifically GPLv3. One reason, I think, is the tivoization clause. Users must be able to modify GPLv3 software and then run the modified version. If they included a GPLv3 bash into iOS, iOS users must be able to modify bash and use the modified one instead.

Your synth example is a good one actually. As a synth owner, I'd probably love to be able to replace the software on it with modified versions from the Internet or modify it myself. Linux is GPLv2, though.

How that morphed into disallowing GCC, idk. Maybe they want to prohibit users from installing their own compilers at some point?

Others have mentioned the patent clause. That one seems reasonable as well. In fact LLVM uses the Apache 2.0 license which also has a patent grant, albeit with a smaller scope. Apple could probably file a patent for a feature, then get a university department to implement that feature, then sue other LLVM users (like Sony). With the GPLv3 that loophole does not exist.


> Apple is intensely allergic to the GPL.

Their actual marketplace behaviour demonstrates that they're allergic to GPL version 3 specifically, not the GPL or copyleft in general.


Which I think is reasonable - I do think GNU went too far with the version 3 licenses. I think Rob Landley's perspective is interesting, where he was involved with Busybox and the legal action against companies violating the GPL with that software, but later created an alternative to Buysbox that was more permissively licensed because he felt that the whole GPL legal action exercise had been counter-productive for open source software in general (net effect of not encouraging many companies to contribute back but instead just making many companies avoid GPL-licensed software altogether).


They’re fairly allergic to copyleft these days; I can’t recall them adopting a new project with any version of GPL for quite a while.


There aren't so many GPL 2.0–only projects out there to adopt.


Isn't Apple clang open source anyway? So what is really the point?

Even distribuiting GPL software isn't a big deal, nowadays even Microsoft does that, shipping an entire Linux distribution in Windows!

There are no technical motivations for not using GPL software, you can do that as long you respect the GPL license (i.e. release the modified source code).

I think that what Apple does is more a policy to go against the FOSS community for political reasons that anything else, and to me is bad, in a world where now even Microsoft is opening up a lot to the open source world.


[Ex FAANG here]

> I think that what Apple does is more a policy to go against the FOSS community for political reasons that anything else

This is the real reason why FAANGs push for non-GPL licenses.

GPL's end goal is to build a community where developers, testers, power users and regular users connect with each other and share knowledge, not just code.

FAANGs want to wedge themselves as the middleman between developers and end users. They view such community as a threat.


> FAANGs want to wedge themselves as the middleman between developers and end users.

well... ostensibly that is where the money can be made (at the point they meet the end-user)


> Isn't Apple clang open source anyway? So what is really the point?

The best explanation I have seen is the speculation that apples software patents are seen as a critical part of apples business model and competitive strategy, especially 10 years ago when their anti-gpl stance was formed. GPLv3 patent clause adds risk, especially the patent agreement clause, and if you intend to spend millions over software patent lawsuits then staying away from GPLv3 looks much more reasonable, especially if you ask the patent lawyers.


> (So they use an older bash on Mac OS X.)

Presumably you mean that Apple specifically avoids GPL3 then, because bash was never distributed under anything other than the GPL to my knowledge. Bash moved from GPL2 to GPL3 though.


They've been on a general GPL purge rampage, in each macOS release there's less and less GPL software (there was some blog post with numbers).

As for bash, in the latest release the default shell is zsh, though I think the old bash is still there, although clearly on the way out.


Apple's only overreacting due to deal Jobs made in the late 90's.


I don't think that was the original motivation, and Clang also introduced other features that GCC didn't / doesn't have: good error messages, cross compilation without having to recompile the compiler, library interface, etc.


We plan on starting to track compile times for Linux kernel builds with llvm. If you have ideas for low hanging fruit in LLVM, we'd love to collaborate.


Shameless plug: https://github.com/dandavison/chronologer runs a benchmark (using hyperfine) over every commit in a repository (or specified git revision range) and produces a boxplot-time-series graph using vega-lite. It works but is rough and I haven't tried to polish it -- does another tool exist that does this?


This is interesting. I'm working in epidemiological modelling atm and something this would be pretty useful to run in a github action CI-style to find performance regressions over time.

I did a quick Google and found this: https://github.com/marketplace/actions/continuous-benchmark


This is something that I've been meaning to work on for a while - I'm doing a hardware project ATM so it won't be for a while - but there seems to be a strong use case for a generic software-tracker app.

Lots of projects have "are we fast yet" type graphs but I'm not aware of a generic tool that also allows you to set alerts for fine grained benchmarks (I made a toy that alerts you to x-sigma increases in cache misses when testing compiler backend patches for example).

One of those projects that I actually want to build but slightly too dull to finish.


Well this is interesting. I thought I was the only one who noticed things getting slower. For a couple releases now I've been thinking I was going crazy, as if something was only ever getting slower on my own machines. Glad to realise someone else illustrating some data to prove it. Thanks, I'll definitely watch this conversation play out as others realise the obvious..


Pretty cool improvements. For any large project profiling it, making it faster, and preventing or reverting regressions can be a full-time job. Perhaps LLVM project needs such a person in such a role. Still, I question the utility of timing optimized builds. Usually when I have to wait for the compiler it's an incremental fastbuild to execute unit tests. Optimized builds usually happen while I'm busy doing something else.


The problem with that is LLVM non-optimized codegen is so bad that many projects build with -O2 even in debug mode.


Really? It's good enough that Google used clang for fastbuild (tests) for many years before switching release builds off GCC.


Building llvm+clang from source is also ludicrous. 70 GB of diskspace usage and takes an hour to build, ridiculous. It's the static linking which is the culprit here, hundred of MB big binaries are a catastrophe for cache and memory subsystem. The funny thing is that my project also uses modules in D. Building the D compiler takes 10 seconds including unpacking of the tarball.


I build LLVM+Clang regularly and it definitely does not take 70GB.


They're probably building doing a debug build which is incredibly disk intensive in my experience.


llvm+clang v10 on Linux builds 8 GB in bin, 13GB in lib and >5 GB in tools + things here an there. That's ~30 GB. Then you need that much for the install. So hard requirement 2 x 30 GB + some slack. Last time I built it was around version 4 and it did not need that much disk space.


> llvm+clang v10 on Linux builds 8 GB in bin, 13GB in lib and >5 GB in tools + things here an there.

just checked and my build folder with llvm & clang is 3GB. That's a release build (pass -DCMAKE_BUILD_TYPE=Release !!) - you don't need a debug build unless you're hacking on llvm itself

> Then you need that much for the install

you want make install/strip, not make install (but why do you need to install ? you can run clang from the build dir just fine)


Thank you for your tips. It's a pity that this information is not very visible on the quick start page.


I’ve found that it’s much worse on Linux for whatever reason. On macOS the build usually fits within 20 GB.


Different linker


Can you clarify how using dynamic linking would be better for the cache? The amount of code that's in memory/cached would be the same, no?


LLVM is both a library and a collection of many tools building on that library. Static linking means that almost all of the library code is linked into all of those tools.

That's extremely wasteful, especially for debug builds.

Luckily, shared library builds of LLVM are easy. And gp might want to invest in a Threadripper. It's possible to compile all of LLVM in a few minutes nowadays :)


I do not choose the hardware (big bureaucratic organization). I will try the LLVM_LINK_LLVM_DYLIB and LLVM_BUILD_LLVM_DYLIB options next time. By default it builds static binaries and libraries requiring gigabytes of disk space.


A shared object, even if mapped in 300 running apps will be loaded only once in physical memory (at least for the code segment). Statically built binaries will have each another copy of the code that will be loaded every time the binary is started. So it will be loaded in memory 300 times.

I just checked the size of the clang-10 binary that was built. It is fucking 1.9 GB big (gcc 9.2 on the same machine is 6.1 MB).


> It is fucking 1.9 GB big (gcc 9.2 on the same machine is 6.1 MB).

but you're comparing a debug, unstripped with a release, stripped build ! (also, 6.1 mb seems low for GCC ? the main GCC binary is cc1plus, not gcc / g++)


That's what building from source builds by default. I limited the build to X86, I don't need the cross compiling capacity There's probably a combination of options that will reduce the sizes and such, but by default it's gigabyte big binaries statically linked.


Disable LTO, thinLTO and use the gold or lld linker, most of your problems except compilation time will go away.


> I can’t say a 10% improvement is making LLVM fast again, we would need a 10x improvement for it to deserve that label. But it’s a start…

It’s a shame, one of the standout feature of llvm/clang used to be that it was faster than GCC. Today, an optimized build with gcc is faster than a debug build with clang. I don’t know if a 10x improvement is feasible, though; tcc is between 10-20x faster than gcc and clang, and part of the reason is that it does a lot less. The architecture of such a compiler may by necessity be too generic.

Here’s a table listing build times for one of my projects with and without optimizations in gcc, clang, and tcc. Tcc w/optimizations shown only for completeness; the time isn’t appreciably different. 20 runs each.

  ┌─────────────────────────────┬──────────┬──────────┬──────────┬─────────┬────────────┬────────────┐
  │                             │Clang -O2 │Clang -O0 │GCC -O2   │GCC -O0  │TCC -O2     │TCC -O0     │
  ├─────────────────────────────┼──────────┼──────────┼──────────┼─────────┼────────────┼────────────┤
  │Average time (s)             │1.49 ±0.11│1.24 ±0.08│1.06 ±0.08│0.8 ±0.04│0.072 ±0.011│0.072 ±0.014│
  ├─────────────────────────────┼──────────┼──────────┼──────────┼─────────┼────────────┼────────────┤
  │Speedup compared to clang -O2│        - │     1.20 │     1.40 │    1.86 │      20.59 │      20.69 │
  ├─────────────────────────────┼──────────┼──────────┼──────────┼─────────┼────────────┼────────────┤
  │Slowdown compared to TCC     │    20.68 │    17.20 │    17.72 │   11.12 │          - │          - │
  └─────────────────────────────┴──────────┴──────────┴──────────┴─────────┴────────────┴────────────┘


> Today, an optimized build with gcc is slower than a debug build with clang.

Did you mean "an optimized build with gcc is faster than a debug build with clang"?


If that's the case, an optimized build with clang is also often faster than a debug build with... clang itself.

The reason is that many of the optimization passes that run first, like dead code elimination, can remove a lot of code early on, so "optimized" builds end up processing significantly less code, which is inherently faster.

The OP might just not be aware of what a "debug build" is. The goal of a debug build is for the binary to execute your code as closely to how you wrote it as possible, so that you can easily debug it.

Their goal isn't "fast compile-times". If you want fast compile-times, try using -O1. At that level, both clang and gcc do optimizations that are known to be cheap and remove a lot of code, which speeds up compile-times significantly. Another trick to speed-up compile-times is to use -g0, and if you do not need exceptions, use -fno-exceptions, since those make the front-end emit much less data, which results in less data having to be processed by the backends.


In my testing, -O1 results in slower compile times than -O0.

Emitting debug symbols doesn't change compile times.


You might have an interesting project, for all my C++ projects, -O1 is significantly faster than -O0 (~2x faster).

Or maybe my projects are the interesting ones :D


Ah - you are using c++.

My project is c, which is why I can use tcc.

Anyway, that makes sense; in c++, there's a lot of 'extra' stuff, single lines of code that add up to much more than they would seem. I bet -O1 lets the compiler inline a lot of std::move, smart ptr semantics; elide monomorphisations, copy constructors/RVO; etc. Which just means less code to spit out the backend.


Ah right, for some reason I thought you were talking about C++.

Yes for C what you mention makes perfect sense.

I agree with you about C++ as well. In particular, C++ templates end up expanding a lot of duplicate code, and at O1 the compiler can remove them.


If you're comparing optimizations, I'd also want to see the runtime of a reference program.


How did you make the table in your comment?


I used j[1]'s automatic table formatting for arrays of boxes. Best documentation I can find on it is this[2]. Can explain more if you're interested.

1: https://www.jsoftware.com/

2: https://code.jsoftware.com/wiki/Typesetting/Box_Drawing_Char...


:-))) Yay IBM-era text graphics. Brilliant! :-)))


I think this is a worthy effort :-) I find the compile times of rust to be quite a big negative point.

However:

> For every tested commit, the programs are compiled in three different configurations: O3, ReleaseThinLTO and ReleaseLTO-g. All of these use -O3 in three different LTO configurations (none, thin and fat), with the last one also enabling debuginfo generation.

I would have thought for developer productivity tracking -O1 compile times would be better wouldn't it?

I'm happy for the CI to spend ages crunching out the best possible binary, but taking time out of the edit-compile-test loop would really help developers.


Both are worth tracking. If O3 is doing useless work then I'll take the speed up. If it is twice as long for a 1% improvement I'll take it.


Hmm. This is one of the unexpected upsides to systems using JIT compilation that I guess we tend to take for granted. The very fact that a JITC runs in parallel to the app means the compiler developers care intensely about the performance of the compiler itself - any regression increases warmup time which is a closely tracked metric.

As long as you can tolerate the warmup, and at least for Java it's not really a big deal for many apps these days because C1/C2 are just so fast, you get fast iteration speeds with pretty good code generation too. The remaining performance pain points in Java apps are things like the lack of explicit vectorisation, value types etc, which are all being worked on.


I would greatly greatly appreciate an effort to benchmark builds without optimizations too. We've seen some LLVM-related slowdowns in Crystal, and --release compile times are far less important than non-release builds to us.


A couple of things a C++ developer can do is to put template instantiation code into a .cpp file, where possible.

"#pragma once" in the header files helps as does using a pre-compiled header file.

Obviously, removing header files that aren't needed makes a difference too.


`pragma once` doesn't do anything that a well-written header guard does.


to Nikic, thank you for this effort.


The root cause of the issue is that they should make mandatory for each pull request merged into llvm to (almost) not regress performance. The CI should have a bunch of canonical performance tests. If it was made mandatory from the start llvm could have been far faster, it is not too late but it's time to put an end to this mediocrity


You may not know, but:

1) LLVM does not use pull-request and code-review isn't mandatory for the main contributors.

2) until a few months ago LLVM didn't even have any testing system before the code is pushed to the master branch: folks just push code directly after (hopefully) building/testing locally.

The real root cause is that no one cares enough to really invest deeply into this. There is also not a clear community guideline on what is acceptable (can I regress O2 compile time by 1% if I improve "some" benchmarks by 1%? Who defines the benchmarks suite? etc.)


To your last paragraph, there's no real technical leadership in LLVM because none of the original contributors are good project stewards, e.g. Lattner is just chasing the next shiny.

There are occasionally some feeble attempts to improve that, but of course people are wary of the potential politics.


Although tracking it would help, I see 2 issues (in opposite directions): I think there are times when slower performance is an acceptable cost. And, I think that if you allow tiny slowdowns, over time we'll get back here. There's judgment involved.


Even just printing the metrics on every pull request, and not gating on them, would make people think about it more.

It would be expensive in terms of CI hours, but at least for the community as a whole it would probably be worth it on something run as often as a compiler.


You could make the argument that:

- a slower build time should not be made at the expense of a more extensible compiler - one that can be modified easily to add capabilities and features to the build output

- a slower build time is acceptable if the build result executes faster or more efficiently. One slower compile vs one million faster executions is keeping your eye on the prize.


The argument is simple IMO:

* release target build times aren't an issue. They can be done overnight and aren't part of the work cycle.

* un-optimized build times are part of the work cycle and should be as speedy as possible.


> * release target build times aren't an issue. They can be done overnight and aren't part of the work cycle.

Emphasis added. This isn't true for many use cases. There are times when release build + single run is faster than debug build because run time is relatively long (e.g. scientific sims with small code bases + big loops). There are times when debug builds simply aren't sufficient (e.g. when optimizing code).


OK that's true but I think my point still stands. Someone who is doing very heavy scientific computation with long-run times will still prefer a release build that is optimizing for run-time speedup over compile-time speedup, within reason of course.


I agree the point still largely stands, that's why I added the emphasis. Maybe I should have made the intent of that clearer.


I was going to counter that doesn't WebKit use LLVM to compile hot JavaScript code paths, but it turns out that they have already replaced it with a bespoke optimizer with faster compilation times. https://webkit.org/blog/5852/introducing-the-b3-jit-compiler...


This all depends on build scenario/configuration.

-O3 - take all the time you need, -O2 and below - please don't regress the performance of the compiler - as article states - we are happy enough with the level of output we currently get.


Yes, exactly, it's a question of what you ask from the compiler.

Not sure whether the -O2 and -O3 system is the exact right choice to communicate that. But any better system would also preserve the user's ability to make this trade-off.

Though I don't think there's necessarily anything magic about -O2. They could conceivably also only protect -O0 and perhaps -O1. Or give finer grained control over the trade-offs.


-O2 is often the only usable output level in some scenarios even during development so it's important to keep it usable in development iterations, -O3 is "I don't care about compile time"


In that case, landing features decreasing benchmark performance should be accompanied by a corresponding performance increase. The WebKit project has done this for nearly 20 years.


I think its mostly a false dichotomy * I think there are times when slower performance is an acceptable cost *

we should significantly slow down compile time for any non negligeable win of runtime performance BUT this slow down should only be incurred in optimised builds (>= 01) and have no side effect on debug compile time which should be fast and it almost does not matter if they get slower at runtime.


Wouldn't that lead to hitting a local maximum / minimum and getting stuck there?


This is just armchair quarterbacking. Utterly useless.


Only a Sith deals in absolutes.


Reimplementing LLVM in Rust could make a big difference as well.


Why do you think that? Rust and C++ are reasonably close in performance.


Not the OP here, but AFAIK LLVM has been struggling with parallelizing compilation of independent functions due to some shared global context that ideally wouldn't be there. A full rewrite would allow a redesign of this part in a more concurrency-friendly way. So it's conceivable that a concurrency-oriented rewrite would bring nice speedups in wall time, not total CPU time. And Rust might give some more guarantees that there really aren't any hidden shared corners.


Given that parallelizing rustc is also kinda struggling, it seems unfair to think that C++ is necessarily the culprit in LLVM case.


In Firefox there've been two failed attempts to make css styling parallel while using C++. Only the parallel rust rewrite, stylo, succeeded.

Rust doesn't replace careful planning of parallel infrastructure or performance optimization, but it makes it possible to maintain the parallel system.


I’m already running make in -j8 and running multiple invocations of clang in parallel, why would I want each one of them using up 800% CPU? Effort spent making a single invocation of clang saturate all my CPU cores is misguided IMO, and the wrong place to be doing parallelization.


Whole-program optimization at link time (cooperating with the Make jobserver protocol, preferably).


Rust isn't going to make fast, lock free, shared memory data structures magically appear.


I doubt this and the amount of effort this would take would be an huge waste.


LLVM has devolved into complete garbage in Xcode for large Swift projects. Slowless aside, at least half the time it won't display values for variables after hitting a breakpoint, and my team has to resort to using print statements to debug issues.


Are you debugging optimized builds? I have definitely had variables be unavailable at debug time, due to optimization in C in lldb. I would guess that the same could be true for swift's integration.


Clang loses a lot of information for values that are still available or computable in its optimization passes. It's not purely the values being completely lost to optimization.

GCC continues to emit relatively better debuginfo at similar optimization levels. Samy Al Bahra has written and talked about this a couple of times over the years.


I dunno, I see the same issue in GCC, even at -Og. Both compilers will aggressively mark variables dead and reuse their (memory, registers) as soon as possible. Just because its in scope, doesn't mean its still live.


Yes, scope and liveness are not exactly the same. No, that does not mean the scoped value cannot be restored cheaply. DWARF can express the value of a variable in terms of expressions that do computations on other registers and/or access memory; it does not have to be as simplistic as "this value lives in this register for some period of time." Clang (and to a lesser extent, GCC) fail to do that for non-live, in-scope registers much of the time. Clang in particular just loses that metadata in many optimization passes.


Anecdotally gcc + gdb does release with debug info builds a lot better than clang + lldb. In general quality of debug info is one place I have had fewer issues with gcc/gdb than I have with clang/lldb.


Given that it works reasonably well for rust and c++, maybe that's more a problem on the Xcode or Swift side?


That would be the LLDB Swift integration specifically.


That's probably a swift issue rather than LLVM. LLDB works great for me on C code.


Perhaps you should try using JetBrains AppCode, from the company that employs the author of this article.


The Rust is 10% slower metric I think is unfair. If you look on godbolt, the LLVM IR that rustc emits isn't that great, so LLVM has to take some extra time to optimize that, compared to the output of clang.


It's not 10% slower compared to clang but slower compared to the prior version of LLVM. That comparison IS fair as LLVM specifically invites people to target it.


As a general rule, "Make X Y again" is worth avoiding - it has connotations that are likely to distract from the message you are trying to put across.

This is a great post full of really interesting technical details. Don't be put off by the title!


All: come on you guys, please stop the offtopicness now.


One has to admit that the parent commenter is correct in that the title does detract from the actual message, inviting off-topic comments.

It could have been easily avoided by picking a different title. Pointing this out might save others from making the same mistake.


The parent comment started the distraction it was complaining about. This is common enough on the internet that there's a name for it: concern trolling. Of course I don't mean that the GP was trolling in the classical sense of trying to ruin the thread, but we have to judge these things by effects (https://hn.algolia.com/?dateRange=all&page=0&prefix=true&que...), not intent, and the effects were clear.

If you care about threads not becoming distracted, the primary thing to cultivate is restraint. There's also downvoting and flagging. And anyone who notices a major distraction in a thread is very welcome to email hn@ycombinator.com so we can look into it. That doesn't have to be a political flamewar—it could be just a generic tangent sitting as the top subthread.


I didn't even think about the possible similarity until you pointed it out.

Perhaps it's just that I'm not letting politics take over my life.


[flagged]


Please don't take HN threads further into political flamewar. It's not what this site is for, and this is extremely off topic no matter how right you are.


I know, this site is for having horrible political opinions and never being challenged on them because that is a "flamewar".


> politics aren't focused on making your life worse, or making you feel unwelcome.

Oh they are, I just prefer to focus on good things rather than bad things.

> things are pretty good for you.

Things are pretty good for me, but it's not because of politics.

I'm not refuting your generalization about the slogan and how some people perceive it. It's definitely not good that we're in a situation where this is happening.

Please be very careful about assuming things about other people, else you will fall into the same trap as people of hate. I never indicated that the slogan and its background don't apply to me, and I'm not dismissing it.

I don't want this to become an arguement or be dismissive of anyone's views, I just want to make sure that people don't drag politics into a place where it doesn't belong, and not to make bad assumptions which don't help at all.


> Oh they are, I just prefer to focus on good things rather than bad things.

If you can afford not to focus on them, then trust me, they are absolutely not focused on doing that. Your experiences is very different from those who are actually suffering from them.

> Things are pretty good for me, but it's not because of politics.

Only because you class those things that go in your favour as "not politics" and only those that challenge your position as "politics".


It's become a well-known catch phrase now. It's only a distraction if you find yourself offended by it, which frankly with everything that's going on right now, is pretty thin-skinned.


As it currently stands, 15 out of 24 comments in this thread are about the phrasing in the title. It has demonstrably become a distraction, at least in this post.


[flagged]


It's not just a title, it's what the article is about, making llvm fast again is the whole point of author's effort. He presents reason for regression in this area throughout the time (regressions on compile time are not captured vs ie. regressions on produced binary performance are). This is a problem ie. julia is heavily dependant on llvm compile times and julia authors mention that every time they're bumping llvm version they're scared about how much performance will degrade.


Being offended by things is dangerous to our culture.

You can stand up for the rights and freedoms of others without turning your nose.

Showing that something doesn't bother you reduces the power bullies and oppressors have over you.

You can even reframe it. Make America Socialist Again. See what I did there?


> Being offended by things is dangerous to our culture.

You kinda sound offended. Is there a Popperian analogue here, that we must be unoffended by everything but offense itself?


I'm not offended, and I don't mind if I get pushback. This is just where I am in my current understanding of life and things.

Hostility breeds hostility. Cultural segregation raises the barrier for empathy.

I don't always practice what I preach, though. In a recent thread I admitted a preference for language that puts censors on edge.


[flagged]


Language evolves. It is time to accept that certain slurs evolved into just general-purpose insults.


Once again: if you are writing an article about LLVM performance, and you have the choice between the title "Make LLVM fast again" and something else, you will likely have more success with your article if you pick something else.

Do you disagree with the advice I'm giving here?


My mind did not immediately jump to MAGA, and I'm an American. It's not a significant issue for most people.


You're the target audience for my comment then!

I'm advising writers that some people reading their content may make this association, so they should avoid the phrase unless that is their deliberate intent.


I personally don't disagree (and don't like that you're being downvoted for making a point), I just think the delta between the two is so small that it's not even worth discussing, let alone self-censoring for.


You read it despite this sensitivity. What audience are you describing that won't read this based on the title? What is the net loss to the author and to them of them missing out on the content because of their decision? What is the long term implication of developing a social contract that requires you avoid any speech that is similar to any speech of political platforms that some portion, potentially a majority, of your audience opposes?

It sounds like a linguistic death spiral to me. Good luck.


.. yet it's the exact click bait that will make many visit..


Thanks for saying it, even if it lands you at the bottom of the thread. Agreed completely. It's not about "being offended" (whatever that even means) but about being in poor taste. Nobody would be surprised if a post titled "All Drives Matter" elicited strong negative reactions from readers, even if it was "literally about drive selection in a RAID array" or whatever. Consciously or not, this title is doing exactly the same thing.

Language has context and meaning outside of its immediate definition and will affect the perception of your written word. Don't make people think of authoritarianism and kids huddled in cages if you want to share your cool technical insights with the world.


[flagged]


Exactly. It's not associated with Trump - so why not pick a title that doesn't risk people even having to ask that question?

The author of this piece is based in Berlin. I'm certain they weren't thinking about any negative potential connotations that this title could hold - they were just reusing a common phrase.


But why change it then? If the article has no connection, then the people complaining are the issue, they should fix themselves, it's not the author's job to cater to that.


By using this title, it is the author, not readers, that brings politics into the discussion. It has a connection, right there in the title.

Maybe it is just a joke, a way to boost clicks, or some kind of myopic view that says technical people are above politics, a mix of that, none of the above, I don't know.

But let's not pretend it was not made on purpose.

Authors constantly adapt their writing to their audience.

Alternative titles that convey more information, invites no political discussion, and don't break HN guidelines:

- "Speeding up LLVM by 10%"

- "Reducing compilation times for LLVM 11"

- "10 optimisations for LLVM" (ok this one is click-baity)

Alternative titles the author would probably have self-consored to cater to audiences:

- "Guess what LLVM? hash tables faster than linear lookup" (accusatory, rude)

- "LLVM getting a Summer Body" (fatphobic)

- "Honey I Shrunk the Compilation Time", "Oh Hi CTMark" (pop-culture is generational, and frankly those jokes are horrible)


Is 'Make sth great again' became a meme..


Yes, and it's a pretty terrible one. It normalizes language used by a... let's say very controversial figure. Transplanting it to contexts where it might make some sense serves to validate the original use.

If someone critical of the US President "ironically" uses the phrase, as in "Foo is slow, so let's make Foo fast again", they inadvertently construct or reinforce a notion of "America was not great, which is why we needed to make America great again" -- something they might not agree with. Also, it's just constantly giving more exposure to someone who already has way too much of it. They would be running a political figure's propaganda for them. Many people will want to do that, but also very many people would not want to do it and should think a bit about a slogan's context before adopting it.

(I'm aware that others across the US political spectrum have used the phrase in the past. It doesn't matter, it is currently very strongly associated with one person.)


Word. I'm working at a very diverse, politically correct European company, yet the copywriters wanted to put out a job ad which said

>make banking simple, intelligent, and personal again!

They said it doesn't have a controversial connotation since we are not recruiting in the US. IMO it absolutely does.


They should hang out on 4chan a bit less perhaps.


What would be a terse way to (not ironically) express the notion that some thing X has gotten worse over time, and that we want to return it to a state where it is objectively the same or better than it was before?


"Back to fast compilation."

If you want a meme: "Faster than before"

Or maybe "Getting LLVM to compile as fast as it used to."

And 10 years ago I would have maybe suggested "Make LLVM fast again" but now it has become so strongly associated with a political party that I avoid it, precisely to avoid turning technical writings into political flamewars.

Or "Reclaiming my compilation time" :-)


I mean, America does have serious issues that make it not nearly great as it could be.

Trump doesn't seem to be addressing those issues, but there is nothing wrong with the implication that America has lost some of its lustre.

Just like the author of a post called "Make LLVM Fast Again" may have identified a problem without actually fixing it. The first step to improvement is identifying the problem. LLVM has problems, America has problems. Seems fine to me.


"Our patience has its limits: improving LLVM performance"


What a lame comment, the tyranny of moral busybodies enacting general rules on language construction.


I like it. If it’s intentional and kind of cheeky the way I interpreted it then I see no issue. It’s funnier and less bland than some strictly technical alternative. If it’s just an innocent call to action and just coincidentally happened to be a nod to the campaign slogan then that’s fine, too. There’s really nothing worth pointing out here.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: