Hacker News new | past | comments | ask | show | jobs | submit login
Why IBM Now Views LLVM As Being Critical Software (phoronix.com)
86 points by protomyth on May 7, 2013 | hide | past | favorite | 36 comments



A few years ago, it was hard to imagine that there would be a breakout compiler that would get developers excited.

It goes to show the value in competition and a fresh perspective. Error messages, modularity, and licensing could unseat GCC, which is pretty amazing.

(I know GCC still has many advantages, but clang/llvm clearly have made a lot of progress.)


In a way there is some aspect of history repeating itself.

The current compiler known as "GCC" is a direct descendant of an earlier GCC fork named EGCS:

http://en.wikipedia.org/wiki/GNU_Compiler_Collection#EGCS_fo...

So GCC was already "unseated" once, the difference being by a direct descendant instead of by a fully different compiler.


Oh man, memories of having gcc and egcs both installed with different support levels for C++ :(


And also running Gentoo unstable branch with that nonsense.


What I find very positive about clang/llvm is:

- Everyone now seems to be using LLVM for language prototyping instead of JVM/.NET, clearly positive for native compilation and to show people without compiler knowledge that managed != VM

- The compiler as a library implementation of clang allows C, C++ and Objective-C to enjoy tooling support similar to JVM and .NET languages

- As someone with a sweet spot for the strong typing found in the Pascal family, the integrated static analysis are just great


Does it have a good story for doing precise GC?

When I looked at it some time ago, it appeared that I'd have to understand and write a lot of C++ code ... which I very much don't want to do.


No, LLVM's support for precise GC is not great. You can do it, but you have to spill all registers that might contain pointers across all calls, which is a significant performance hit. (Note that GCC is the same way though, as I understand it.)

In my opinion, you're usually better off just doing conservative-on-the-stack GC if you're using LLVM. (This is what Rust does.)

The basic problem is that LLVM loses the distinction between pointers and integers as soon as it drops into SelectionDAG and later passes. Fixing this would be a lot of work.


If you're willing to jump through some hoops, you don't have to spill. You can recover the pointers on the stack by making all function calls return a flag indicating whether it's a normal return or a stack scan for GC. If a function returns the stack scan flag, you register the local variables in the current stack frame to the GC, and then you return the stack scan flag from that function. This way it unrolls the entire stack and you scan everything.

Of course you've now destroyed the stack, so either you have to copy the stack before this process starts and later restore it, or you introduce stack restoring functions as a source to source transformation. The former is probably simpler and faster, but the latter is neat because it doesn't use any hacks like copying the stack, and because it allows you to restore only part of the stack and keep part of it in the heap and restore it lazily, which is important for GC pause times if you have very deep stacks.

The disadvantage of this approach compared to the ideal is that now all function returns get slower, because they have to check the flag. However it's probably better than spilling everything, especially if you inline enough so that function calls are infrequent.

Example:

    def f(...):
       var x;
       var y;
       ...
       z = g(...)
       ...
becomes:

    def f(...):
       var x;
       var y;
       ...
       z = g(...)
       if STACKSCAN:
         register_gc_root(x)
         register_gc_root(y)
         return // scan parent stack frame
       ....


Another cool LLVM based project that has Garbage Collection is the Epoch Programming Language[0] uses a similar approach [1]. Original reddit submission here [2].

[0] https://code.google.com/p/epoch-language/

[1] https://code.google.com/p/epoch-language/wiki/GarbageCollect...

[2] http://www.reddit.com/r/programming/comments/1c5jz6/realisti...


GHC haskell has precise ghc via the llvm and native code backends both.

That said, it doesn't (i think) use any of the llvm hooks. Instead the gc machinery is baked into the relevant primops. The hand written primops can be found here https://github.com/ghc/ghc/blob/master/rts/PrimOps.cmm#L51 (the low level primops that aren't compiler generated are written in cmm, a ghc specific dialect of c minus minus. )


Sadly, not yet.


More than a few years ago, it was very easy to imagine a breakout compiler getting developers excited. I think it was call "egcs" or somesuch . . .

Joking aside, I'm glad for the competition, and for the options. It's always nice to have another automated way to audit code: throw another compiler with maximum warnings at it.


To be fair: egcs was a fork. It was a group of GCC people deciding that they wanted to work on the GCC code outside the control of the GCC project administration. And as it happened they were better at it and the FSF essentially took egcs back into the fold by handing over control of "GCC" to the forked project.

LLVM/clang is a rather different thing, for both good and bad reasons.


The most interesting thing for me is the library approach to the compiler. I wonder if it would benefit some other open source languages to restructure this way. It is a nice idea to be able to use the same compiler libraries in other tooling.


While the separation of concerns in LLVM is laudable, I think that too many developers aren't even aware of some of the nice features in GCC. I was pleasantly surprised to find that Go support is now an option in building GCC, and I wouldn't be surprised if its fast adoption was in no small part due to the modularity of GCC (I've unfortunately not had the time to track development of GCC or LLVM for years). It still surprises me how many C/C++ developers don't know about _FORTIFY_SOURCE, -fstack-protector, or even -fmudflap. Consider also that coverage and profiling are builtin, and GCC offers a nice set of tools for professional software development. They're just not well advertised. I'm grateful for LLVM, but I wonder if some of the excitement over it is not just hype and it being the new hotness.

EDIT: Can't believe I used "it's" instead of "its".


The most likely reason gcc was used was that Ian Taylor, the guy who is part of Google's core team and wrote gccgo, was already familiar with gcc by virtue of being long-time gcc developer.

There already is a project to build a go compiler on top of llvm (https://github.com/axw/llgo).


The official reason as mentioned in Go's FAQ (http://golang.org/doc/faq):

> We also considered using LLVM for gc but we felt it was too large and slow to meet our performance goals.

Does this mean compilation speed, performance speed, or both? Or is this a hand-wavy excuse to use gcc instead due to Ian's expertise?

That being said, there is a project that builds Go on top of LLVM:

https://github.com/axw/llgo


>Does this mean compilation speed, performance speed, or both? Or is this a hand-wavy excuse to use gcc instead due to Ian's expertise?

IIRC that remark related to using LLVM or GCC as the backends for the Go compiler, basically they found both backends to be too large and slow for their liking so they wrote their own full compiler from scratch.

I believe the GccGo project is mainly about leveraging all the architectures which GCC supports, many of which Go isn't ported to and possibly won't be ported to (as in maintained by the Go developers).


I guess my question would be "Can I use the same parser that GCC uses in my tool?". I am most interested in that aspect of clang.


This is, in large part, something the FSF did not want to happen without the app being GPLv3.

The other interesting thing is that this is only really an issue due to difficulty in properly parsing/semantically analyzing languages like C++.

For languages like Java, it simply isn't necessary, or in some cases, desirable.


"For languages like Java, it simply isn't necessary, or in some cases, desirable."

Why would it not be desirable? Also, if I have an idea for a great tool, why should I have to write my own parser? I have more faith the actual compiler will have bugs fixed quicker in this regard.


If your goal is non-compiler analysis, like refactoring or verification, it's often more flexible to use a framework specifically designed to analyze source code and give you a nice AST with various options, rather than pulling the parser out of a compiler front-end. For languages with reasonably simple grammar there are usually one or more such packages, usually pretty well debugged. For example, for Java you could use http://spoon.gforge.inria.fr/


As a counter-example/argument, CEDET/Semantic (http://cedet.sourceforge.net/semantic.shtml) will use Clang/LLVM and/or GCC if you tell it to. It only makes sense, if someone has already gone to the trouble of not only writing a full up parser and lexer for your language, but also exposing things through modularization, you might as well use it.


Again, the argument i've made is that this is only because C/C++ are so difficult to otherwise parse and auto-complete. Can you find a java example (for example :P)? AFAIK, all the java IDE's use their own incremental parsers and completers.


It wouldn't be desirable in a lot of cases for the reason the other person mentioned. I don't believe you should have to write your own parser mind you, just that your compiler's parser may not be the one you want.

In general, most languages are stable enough that you don't really get parsing/semantics bugs, so you wouldn't really need to worry about your last concern.

C++ is simply so complicated that the implementations end up with a bugs due to necessary complexity of implementation, but this is not generally true of languages.


Semantic in CEDET (http://cedet.sourceforge.net/semantic.shtml) will do this with both GCC and Clang/LLVM.


If you use code from GCC and distribute the combined work, then you must adhere to the license of GCC.

With CLANG, same is true, but the license requirementa are fewer.


  http://docs.python.org/2/library/ast.html

  http://golang.org/pkg/go/parser/
  (See also gofmt: http://golang.org/cmd/gofmt/
   and gofix: http://golang.org/cmd/fix/ )


It depends on what you mean by a few years ago. Even 5-6 years ago, it was clear to a bunch of GCC devs this was going to be the result :)

The need and desire was very clear, and it was clear LLVM would meet that.


Is there any strong technical reasons why GCC can't just copy any useful features from Clang such as the improved error messages and put that into GCC? License-wise, there shouldn't be any problems to put permissive code inside a copyleft project (except maybe politics).


Sure: Generating good diagnostics requires a large quantity of infrastructure, much of which is inherent to the compiler's design and not something readily ported piecemeal. That's not to say one compiler won't learn and benefit from changes in the other, but reimplementing features in a very different design is more than a simple "copy".


What is an example advantage of gcc? (old parallelizing compiler person asking, but not working on such for years, randomly)


OpenMP support.


Intel has recently released a permissively-licensed OpenMP runtime (http://www.openmprtl.org/) though, and Intel's compiler team is working on OpenMP support (http://software.intel.com/en-us/forums/topic/391189, slides at http://llvm.org/devmtg/2013-04/bokhanko-bataev-slides.pdf) in Clang/LLVM using it. I suspect it won't be long before there's parity on this point.


Thanks.


While Clang/LLVM is cool, this is more of a pure business decision.

IBM supports multiple compiler toolchains: XL C/C++/FORTRAN, GCC, Java, COBOL, etc...

So, when a customer comes by with a codebase and wants to run it on their System x/System p/System z big iron, IBM's response needs to be a simple "Yes" when it comes to support.

Customer A has used XL since the epoch of computing, cool, that's supported.

Customer B has codebase tuned and littered with GCC attributes, cool, that's supported as well.

Customer C wants Clang/LLVM because devs find it easier to debug and it's growing in features, cool, that will be supported as well.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: