- Everyone now seems to be using LLVM for language prototyping instead of JVM/.NET, clearly positive for native compilation and to show people without compiler knowledge that managed != VM
- The compiler as a library implementation of clang allows C, C++ and Objective-C to enjoy tooling support similar to JVM and .NET languages
- As someone with a sweet spot for the strong typing found in the Pascal family, the integrated static analysis are just great
No, LLVM's support for precise GC is not great. You can do it, but you have to spill all registers that might contain pointers across all calls, which is a significant performance hit. (Note that GCC is the same way though, as I understand it.)
In my opinion, you're usually better off just doing conservative-on-the-stack GC if you're using LLVM. (This is what Rust does.)
The basic problem is that LLVM loses the distinction between pointers and integers as soon as it drops into SelectionDAG and later passes. Fixing this would be a lot of work.
If you're willing to jump through some hoops, you don't have to spill. You can recover the pointers on the stack by making all function calls return a flag indicating whether it's a normal return or a stack scan for GC. If a function returns the stack scan flag, you register the local variables in the current stack frame to the GC, and then you return the stack scan flag from that function. This way it unrolls the entire stack and you scan everything.
Of course you've now destroyed the stack, so either you have to copy the stack before this process starts and later restore it, or you introduce stack restoring functions as a source to source transformation. The former is probably simpler and faster, but the latter is neat because it doesn't use any hacks like copying the stack, and because it allows you to restore only part of the stack and keep part of it in the heap and restore it lazily, which is important for GC pause times if you have very deep stacks.
The disadvantage of this approach compared to the ideal is that now all function returns get slower, because they have to check the flag. However it's probably better than spilling everything, especially if you inline enough so that function calls are infrequent.
Example:
def f(...):
var x;
var y;
...
z = g(...)
...
becomes:
def f(...):
var x;
var y;
...
z = g(...)
if STACKSCAN:
register_gc_root(x)
register_gc_root(y)
return // scan parent stack frame
....
Another cool LLVM based project that has Garbage Collection is the Epoch Programming Language[0] uses a similar approach [1]. Original reddit submission here [2].
GHC haskell has precise ghc via the llvm and native code backends both.
That said, it doesn't (i think) use any of the llvm hooks. Instead the gc machinery is baked into the relevant primops. The hand written primops can be found here https://github.com/ghc/ghc/blob/master/rts/PrimOps.cmm#L51
(the low level primops that aren't compiler generated are written in cmm, a ghc specific dialect of c minus minus. )
More than a few years ago, it was very easy to imagine a breakout compiler getting developers excited. I think it was call "egcs" or somesuch . . .
Joking aside, I'm glad for the competition, and for the options. It's always nice to have another automated way to audit code: throw another compiler with maximum warnings at it.
To be fair: egcs was a fork. It was a group of GCC people deciding that they wanted to work on the GCC code outside the control of the GCC project administration. And as it happened they were better at it and the FSF essentially took egcs back into the fold by handing over control of "GCC" to the forked project.
LLVM/clang is a rather different thing, for both good and bad reasons.
The most interesting thing for me is the library approach to the compiler. I wonder if it would benefit some other open source languages to restructure this way. It is a nice idea to be able to use the same compiler libraries in other tooling.
While the separation of concerns in LLVM is laudable, I think that too many developers aren't even aware of some of the nice features in GCC. I was pleasantly surprised to find that Go support is now an option in building GCC, and I wouldn't be surprised if its fast adoption was in no small part due to the modularity of GCC (I've unfortunately not had the time to track development of GCC or LLVM for years). It still surprises me how many C/C++ developers don't know about _FORTIFY_SOURCE, -fstack-protector, or even -fmudflap. Consider also that coverage and profiling are builtin, and GCC offers a nice set of tools for professional software development. They're just not well advertised. I'm grateful for LLVM, but I wonder if some of the excitement over it is not just hype and it being the new hotness.
EDIT: Can't believe I used "it's" instead of "its".
The most likely reason gcc was used was that Ian Taylor, the guy who is part of Google's core team and wrote gccgo, was already familiar with gcc by virtue of being long-time gcc developer.
>Does this mean compilation speed, performance speed, or both? Or is this a hand-wavy excuse to use gcc instead due to Ian's expertise?
IIRC that remark related to using LLVM or GCC as the backends for the Go compiler, basically they found both backends to be too large and slow for their liking so they wrote their own full compiler from scratch.
I believe the GccGo project is mainly about leveraging all the architectures which GCC supports, many of which Go isn't ported to and possibly won't be ported to (as in maintained by the Go developers).
"For languages like Java, it simply isn't necessary, or in some cases, desirable."
Why would it not be desirable? Also, if I have an idea for a great tool, why should I have to write my own parser? I have more faith the actual compiler will have bugs fixed quicker in this regard.
If your goal is non-compiler analysis, like refactoring or verification, it's often more flexible to use a framework specifically designed to analyze source code and give you a nice AST with various options, rather than pulling the parser out of a compiler front-end. For languages with reasonably simple grammar there are usually one or more such packages, usually pretty well debugged. For example, for Java you could use http://spoon.gforge.inria.fr/
As a counter-example/argument, CEDET/Semantic (http://cedet.sourceforge.net/semantic.shtml) will use Clang/LLVM and/or GCC if you tell it to. It only makes sense, if someone has already gone to the trouble of not only writing a full up parser and lexer for your language, but also exposing things through modularization, you might as well use it.
Again, the argument i've made is that this is only because C/C++ are so difficult to otherwise parse and auto-complete.
Can you find a java example (for example :P)?
AFAIK, all the java IDE's use their own incremental parsers and completers.
It wouldn't be desirable in a lot of cases for the reason the other person mentioned.
I don't believe you should have to write your own parser mind you, just that your compiler's parser may not be the one you want.
In general, most languages are stable enough that you don't really get parsing/semantics bugs, so you wouldn't really need to worry about your last concern.
C++ is simply so complicated that the implementations end up with a bugs due to necessary complexity of implementation, but this is not generally true of languages.
http://docs.python.org/2/library/ast.html
http://golang.org/pkg/go/parser/
(See also gofmt: http://golang.org/cmd/gofmt/
and gofix: http://golang.org/cmd/fix/ )
Is there any strong technical reasons why GCC can't just copy any useful features from Clang such as the improved error messages and put that into GCC? License-wise, there shouldn't be any problems to put permissive code inside a copyleft project (except maybe politics).
Sure: Generating good diagnostics requires a large quantity of infrastructure, much of which is inherent to the compiler's design and not something readily ported piecemeal. That's not to say one compiler won't learn and benefit from changes in the other, but reimplementing features in a very different design is more than a simple "copy".
So, when a customer comes by with a codebase and wants to run it on their System x/System p/System z big iron, IBM's response needs to be a simple "Yes" when it comes to support.
Customer A has used XL since the epoch of computing, cool, that's supported.
Customer B has codebase tuned and littered with GCC attributes, cool, that's supported as well.
Customer C wants Clang/LLVM because devs find it easier to debug and it's growing in features, cool, that will be supported as well.
It goes to show the value in competition and a fresh perspective. Error messages, modularity, and licensing could unseat GCC, which is pretty amazing.
(I know GCC still has many advantages, but clang/llvm clearly have made a lot of progress.)