Hacker News new | past | comments | ask | show | jobs | submit login

Disclaimer up-front: my day job is hacking on GCC and related GNU pieces of infrastructure, so that colors my opinions somewhat. I speak only for myself, not for my employer.

My level of experience in both: I hacked a specialized LLVM inlining pass in graduate school. I've written two middle-end passes for GCC and tweaked GCC's inliner in similar, though less invasive, ways. I found the level of difficulty to be similar between the two projects.

Other people may have different opinions. I remember reading a claim of a graduate school project that tried for four months to get started on a middle-end optimization pass in GCC and got nowhere; after switching to LLVM, they made progress in a month or so. Personally, I think that meant they weren't trying very hard with GCC.




Given your experience with both, what would you consider the pain points of getting a grasp of each when looking to contribute? I know when I looked at GCC, the mix of custom manual memory management (gcc_free and pals), semi-automatic management via obstacks, and automatic management via garbage collection seemed baroque and frightening.


I don't have a good sense of pain points in LLVM; the inliner hacking was a while ago, I don't remember many of the details, and LLVM has surely changed quite a bit since then.

As for GCC, I think the pain points are twofold: the documentation for the middle-end is somewhat scattered. I honestly think enough information for figuring things out is present, it's just not always obvious where to look. There are lots of other passes to look at too, which can be extremely helpful. Assumptions of the interface, or side-effects, are not always stated, which can be surprising at times. The other pain point is contributing upstream: you're going to get dinged on formatting, documentation (usually just "did you do it"; the review is generally not as thorough as say, GDB's documentation review), compilation time, etc. etc.

Also, GCC's hash tables (htab_t) are a pain to use correctly.

Of course, my experience is somewhat slanted towards the middle-end; the set of pain points is somewhat different if you are working in the front-end or the back-end. And my set of pain points from the middle-end might be different if I had worked on different optimization passes. (My passes cared very little about things like aliasing, for instance.)

It's surprising to me that you mention memory management as a pain point, though I can see how the variety can be bewildering. The only distinction that really matters is between GC'd and non-GC'd memory. obstacks and alloc pools are only ways of providing specialized malloc interfaces. A useful rule of thumb is that if your data is only needed for one pass of the compiler, then you can allocate it any way you like; if the data is longer-lived than that, it needs to go in GC'd memory. I can elaborate if you'd like, but that's the basic idea.

FWIW, I agree that the whole GC system is somewhat baroque. The GC was a decent solution to an engineering problem and the whole mechanism nicely solved memory management problems and provided the basis for precompiled headers, but it causes problems in other ways nowadays and trying to get rid of it would be a huge effort.




Consider applying for YC's first-ever Fall batch! Applications are open till Aug 27.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: