Hacker News new | past | comments | ask | show | jobs | submit login
Designing a garbage collector in Rust (manishearth.github.io)
127 points by Manishearth on Aug 31, 2015 | hide | past | favorite | 17 comments



This article is a great read! The article makes it seem as though once you had a valid theoretical model implementation was actually easy, is that true? It sounds so simple it might even be implemented in tens of lines of Rust, or is there a lot of hidden complexity you skipped over in the article?

In your paragraph about rooting you mention how language support might enable a better implementation (using llvms rooting), but the approach you took sounds very solid. What is the disadvantage of the way you do it now?

Edit: just scrolled through the implementation, it really is as short as the article made it seem, just a couple hundred lines! Now I'm really curious to its performance compared for example with that of Java or Go.


> This article is a great read!

Thanks! Glad you enjoyed it!

> The article makes it seem as though once you had a valid theoretical model implementation was actually easy, is that true?

Rust's affine type system is a huge win here. It has a no-surprises approach to handling data, so it's very easy to practically track roots and all.

So yeah; the implementation is not much more complex than the design above (I don't think I skip any nontrivial part of the design in the post, though that is purely by accident -- I intended to give a sketch of the design and the whole design just happened to come out)

> What is the disadvantage of the way you do it now?

Think of manual rooting as akin to Rc or shared_ptr wrt cost; and the llvm thing is Box/unique_ptr.

Just how the compiler inserts deterministic alloc/free calls with Box, the compiler keeps track of stack based gcthings in llvm. All of the dynamic checks above for rooting become unnecessary; instead at mark time you just pick up the roots from the stack frames. (Note: I haven't looked into the llvm feature; I don't know if it works this way).

Tracking roots is something a compiler is good at. Not so much a runtime.

Regarding Java and Go: I don't know; I haven't tried. But the applications are different -- Java is pervasive, and Go is pervasive with value types and escape analysis (also default concurrent). Rust-gc will probably be horrible at pervasive GCing. Besides, the other two have compiler support so have it much easier when they want to track roots. You can statically insert unconditional calls to `roots.push(var)` if you own the compiler, but as a library author you can have to keep track of many things at runtime to determine if something is a root (which is why llvms solution looks promising to me). In rust-gcs case, rooting status is stored by all the other existing gcptrs, and the status is changed conditionally depending on their states and interaction.


You may find these two article interesting prior art, which I happen to be very familiar with. :)

Implementation and performance evaluation of a safe runtime system in Cyclone http://scholar.google.com/citations?view_op=view_citation&hl...

Type-preserving garbage collectors http://scholar.google.com/citations?view_op=view_citation&hl...


Thanks, these look quite interesting! I'll read them when I get time.


See also chapter 7 of Morrisett's thesis:

http://www.cs.cmu.edu/~rwh/theses/morrisett.pdf


I'm not sure how I feel about GC in Rust.

One of the useful things with Rust's ownership model is it forces good architectures. I feel like with a GC it's much easier to be "lazy" about who owns something which seems very counter to what I've found interesting and useful in Rust.


As I've mentioned in the post (anmd the readme, and in multiple discussions), rust-gc is not meant to be used pervasively :)

Given that Rc is rarely/never used pervasively; I'm confident that this won't happen.

For those intending to use GC pervasively: Every time you use Gc<bool>; I kill a kitten. Scratch that, I unleash an AbstractKittenKillerProxyFactory on the world :P

Think of the kittens!


> Cases where one needs to maintain a complicated, dynamic graph are where a GC becomes useful. Similarly, if one is writing an interpreter for a GCd language, having a GC in Rust would simplify things a lot. > > Not to say that one should pervasively use a GC in Rust. Similar to Rc<T>, it’s best to use regular ownership-based memory management as much as possible, and sprinkle Rc/Gc in places where your code needs it.

Hopefully people heed this advice. My guess is it will not be a problem because most example code (and all of the docs) will not use Gc pervasively.


Not only will the example code and docs never use Gc, there won't even be a Gc implementation in the stdlib. There are just too many potential variants of Gc (along with too little concrete demand) to ship any implementation by default.


Besides what kibwen said, the docs don't even use Rc much, so we're safe on that front :)


Using pervasive sharing in Rust is not the easiest route even with a perfect GC, e.g. one is forced to use types like RefCell and Cell anywhere that mutability is required, adding syntactic overhead, more restrictions (especially around threading), slight runtime overhead (mainly for the former; Cell "just" inhibits aliasing-based optimisations, while RefCell does that and stores & manipulates some extra integers) and runtime panics instead of compiletime errors, for RefCell.


That's the thing though, every place I've seen pervasive sharing could be replaced my explicit ownership(including a clean mechanism for transferring ownership).

Being explicit about who owns something makes it much harder to leak things by forgetting to clear references/etc.


Garbage collection, or something like it, is very useful when implementing lock-free data structures, which are very important in systems programming. Typically they get implemented using some kind of RCU (there was even a post about doing that in Rust, recently), but it's good to have options when you're tuning performance at that level.



It's very much opt-in: if you don't want it, it's easy (and the default) to not use it.


On the other hand, GC decouples reclamation from control flow. This means that, for instance, if you drop a reference to an object referencing a million other objects, that reclaiming does not necessarily have to happen instantaneously (i.e., blocking control flow). Instead, a GC can reclaim those objects in the background if desired (and if so designed).


Sometimes I feel rigorous thinking in software is a form of delusion.

(Possibly related: one of the first thing I noted as a student (in architecture) was the fact of the 'beaten path' vs the ideal of the designated path.)




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: