Rare Are GC Talks

silentbicycle · on Aug 4, 2012

Wilson's highly readable "Uniprocessor Garbage Collection Techniques" is a far better overview: http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.138.... You could write a decent GC with just the info in that paper.

It doesn't cover parallel or real-time garbage collection (which both get far more complicated); for those, you want Jones et. al's _The Garbage Collection Handbook_ (http://www.amazon.com/The-Garbage-Collection-Handbook-Manage...) and plenty of time to explore its bibliography. (His older book is also good, but doesn't cover those topics.)

mattyb · on Aug 4, 2012

That is a good paper. Thanks for the pointer!

gue5t · on Aug 4, 2012

The reference counting metaphor leaves out a critical aspect of reference-counting: cyclical structures and how to collect them. Books never write their names in other books so this doesn't appear in his discussion, though it's one of the more important complications for implementations.

electrograv · on Aug 4, 2012

Honestly, this article is really bad and probably harmful to new programmers. I recommend against taking anything in it to be accurate, much less complete.

The author misses vitally key issues (like the inherent dangers of circular references in reference counting "GC"). I wouldn't even call myself slightly knowledgeable in GC, yet blatant holes in the clumsy analogies of this article are obvious to me. As someone who is not even slightly knowledgable in GC, it scares me that people are learning from someone who has massive information holes in the explanation -- holes in topics that every programmer on earth should know.

On top of all that, the only "Cons" that are mentioned of RC are absurd: "It's annoying if you forget to release your references." Wow. Profound stuff there - bugs and incorrect code is annoying. This isn't even a problem if your language supports RAII, in which it's impossible to "forget" in the first place.

Groxx · on Aug 4, 2012

I sincerely doubt people are going to read this, and then immediately write and ship a garbage collector. For a square-zero general-idea overview to someone who has zero concept of what a GC does, it's more than accurate enough.

It could probably be about 1/2 the size, as it shoots off in weirdly-complex directions for such an introductory text. And it could probably introduce e.g. reference cycles, because they're a real problem with reference counting. But it's a clearer and simpler text by far than I generally see on this topic.

marblar · on Aug 4, 2012

Extend the metaphor: Mr. B also has to write his name on people who he thinks he will need in the future, lest they be fired.

drostie · on Aug 4, 2012

One of the most fun discussions of the complications of reference counting and how nuanced you have to be comes from the Python docs:

http://docs.python.org/extending/extending.html#reference-co...

It relates "a true story. An older version of Python contained variants of this bug and someone spent a considerable amount of time in a C debugger to figure out why his __del__() methods would fail" in a considerable amount of detail.

ctide · on Aug 4, 2012

At the December 2008 Kyushu Ruby 01 Conference, I asked the audience "How many of you here have some interest in garbage collection?" Out of 200 people, only 3 raised their hands

Can you imagine how many people would raise their hands if you asked that question NOW at a ruby conference? Crazy how times change.

notatoad · on Aug 4, 2012

can you explain a bit why that has changed? I don't really know anything about ruby, but I would have thought that being able to ignore abstractions like garbage collection was the reason that people used high-level languages. Does programming in ruby involve a lot of interaction with the garbage collector, or are you saying that a lot of the ruby community is actually becoming involved with writing and improving low-level language features?

suresk · on Aug 4, 2012

My experience in this regard comes from Java, not Ruby, but I think the basic concepts are the same: You are correct in that, for a lot of uses, ignoring abstractions like GC is just fine.

If you are building something that uses a lot of memory, requires high throughput or low latency, or are getting into situations where GC is taking up a decent chunk of your CPU time, then it can pay to have a better understanding of the GC concepts so you can make beneficial changes to your application or to the GC parameters.

Plus, stuff like GC is kind of interesting by itself for some people. Some of my favorite sessions at JavaOne were always the GC and JVM internals ones, despite the fact that their immediate utility to me wasn't always obvious.

bri3d · on Aug 4, 2012

The MRI Ruby 1.8 garbage collector was sub-optimal for the large web applications it ended up powering, due to both blocking / performance issues and a lack of predictability and instrumentation. In 2009-2011, many high-profile Ruby 1.8 implementation enhancements focused on garbage collection, as the MRI 1.8 GC became a major pain point for a lot of growing high-profile Ruby + Rails shops. Both Ruby Enterprise Edition and Twitter's Kiji were very popular MRI 1.8 enhancements that focused on improving the performance and visibility / instrumentation / debuggability of the garbage collector.

Now that JRuby is much more mature and YARV / Ruby 1.9 has enjoyed wide adoption, more of the Ruby community are able to ignore GC (or only tune it once) - but that simply wasn't possible with MRI 1.8 and a large-scale web app.

ctide · on Aug 4, 2012

One of the biggest problems with the 1.8.x versions of MRI (or CRuby as it's referred to in this article) is that the GC is pretty slow and tends to get called a lot. A good explanation of the reasons behind that can be found here:

http://www.engineyard.com/blog/2010/mri-memory-allocation-a-...

As Rails got more popular and sites built on Rails got bigger, people started running into performance issues that could often be rooted down to GC being called over and over during a single web request (due to instantiation of lots of AR objects, causing lots of memory allocations which lead to GC being called.) This lead to people exploring other implementations of Ruby (REE, Rubinius, etc.) and more importantly, becoming a lot more familiar with how GC works in MRI and how to program around it's shortcomings.

kenmazy · on Aug 4, 2012

Another great GC discussion for LuaJit: http://wiki.luajit.org/New-Garbage-Collector

alinajaf · on Aug 4, 2012

Interesting that there's no mention of generational GC in this article. Anyone with a bit more expertise care to comment on why not?

maximilianburke · on Aug 4, 2012

I'm guessing it's either because of scope or because generational collectors are a type of copying collector. Instead of having just from/to spaces that the objects ping-pong between as they fill up, a generational collector will copy objects from one heap generation to the next as they fill. Once the oldest generation (tenured space) reaches capacity it cannot trickle down the live objects to another generation so some other method, like a mark+sweep+compacting method, is used for collecting it. You could probably have a half space tenured generation but half space collectors are not really space-efficient without resorting to a bunch of virtual memory mapping trickery, and the tenured space tends to be large.

msbarnett · on Aug 4, 2012

It isn't mentioned because the author chose not to include it.

Whether that's because this isn't meant to be a complete survey of the field and the author didn't think it fit the introductory nature of the piece, or because it didn't fit within the time limits of the talk this article is based on, or because Ruby doesn't do generational optimizations and the author saw no reason to wander down that path, isn't a question that expertise in gc algorithms gives much insight into.

jimm · on Aug 4, 2012

Yeah, I noticed that. I first heard of generational GC in association with Smalltalk VMs and I think that's where they made their debut.

cwzwarich · on Aug 5, 2012

The first generational garbage collector was for Lisp:

http://web.media.mit.edu/~lieber/Lieberary/GC/Realtime/Realt...

adamnemecek · on Aug 4, 2012

Articles like this are exactly why I come to HN.

pat_shaughnessy · on Aug 4, 2012

Agreed! Also, beyond the great tech content there's something charming about this one.

cronin101 · on Aug 5, 2012

In the article he states "[for Copying] Conservative GC is unable to determine the difference between pointer and integer values. If an integer value was overwritten the result would be disastrous. It is likely that the result of 1+1 would change occasionally."

However reading through the RHG, it is shown that Ruby allocates memory for objects in 20 byte blocks and that this results in all pointers therefore being multiples of four. This is handy as it allows the direct usage of literals such as FixNum (an always odd 'pointer' that you shift right by one bit to access the value) and Symbol (a 'pointer' that always has 0xff as the trailing byte that you shift right 1 byte to access the unique ID) without requiring object creation.

With this in mind, can someone enlighten me as to why Copying could not be used inside Ruby? It seems as though it would be trivial for the GC to differentiate between literals and pointers as otherwise they would not be much use as literals.

kmm · on Aug 4, 2012

I don't see how a Copying GC can be faster than Mark and Sweep. Transferring a book may be easy, but copying an object can be quite costly. I can't imagine copying almost all objects everytime the GC runs.

waterhouse · on Aug 4, 2012

Suppose that the program only needs about 1 MB of objects, but it will fill up 100 MB before performing a copying GC. In mark and sweep, you need to perform an operation for every garbage object (of which there are 99 MB), whereas with copying GC, you need only perform operations for every live object (of which there is 1 MB). Even if copying an object takes ten times as long as returning a chunk of memory to a free-list, the copying algorithm is faster. (Also, popping a free cell off a free-list is more expensive than incrementing an allocation pointer.) Of course, either way, the total work done for allocating objects and freeing them is O([number of objects allocated]).

Something like this is described in detail here: "Garbage Collection Can Be Faster Than Stack Allocation", Andrew Appel: http://www.cs.princeton.edu/~appel/papers/45.ps

silentbicycle · on Aug 4, 2012

Not necessarily -- lazy sweeping only needs to do work proportional to live data (like copying GC), rather than proportional to dead data.

I have a pretty straightforward implementation of a lazy mark-sweep collector here (in C): https://github.com/silentbicycle/oscar

Basically, instead of marking live data, then sweeping all unmarked data, you just mark live data, then the next time a slot is requested, you step the allocator over the heap until it finds the next unmarked cell and re-use that. If most cells are dead, this happens very quickly. If you have to sweep over too many live cells (left deliberately vague), you grow the heap.

waterhouse · on Aug 4, 2012

Interesting... I still feel that this is for the most part merely deferring the free-ing work to when you next allocate an object. It may succeed in getting the "free" cost down to a single instruction--checking a mark bit--but it is still a cost per dead object that a copying GC wouldn't have to pay. (The difference may be, as some say, "academic".) It does have the slight advantage of reducing the maximum pause time due to sweeping down to being proportional to live objects rather than garbage.

Also, what if you want to allocate a large object? You might have to skip over some unmarked cells. Either you'd waste the memory, or you would probably want to put them on a free-list of some sort, in which case you are doing more than just bumping a pointer and checking for a mark. ... I see, it says "equally-sized chunks of memory".

Still, that's a cool idea. As mark-and-sweep goes, anyway. :-} I'm rather more fond of copying garbage collectors--I like the determinism, the fact that (normally) the memory address of each object after a full copying GC will depend only on the structure of the graph of objects, and not on the memory addresses of the objects prior to the GC. (In other words, it destroys the information "what the memory address of each object was before this GC". This implies a somewhat higher inherent cost, to eliminate this entropy. See [1] for musings about the relation of this to thermodynamics.) In particular, whether you have fragmentation issues won't depend on the past history of your program. (You could say that the space lost to fragmentation is fixed at a factor of 2.)

[1] http://www.pipeline.com/~hbaker1/ThermoGC.html

silentbicycle · on Aug 5, 2012

Checking the mark bit can be quite cheap, though, particularly if you store the mark bits in their own array - the next several bits will be in-cache. Still, the overall performance is likely to vary quite a bit for either scheme depending on the project's other quirks. Mark-sweep has the advantage that it doesn't need to move objects, but copying is easier with variable-size allocations, etc.

Another disadvantage with lazy sweeping is I don't see a clear way to make it generational. (Right?)

crumblan · on Aug 5, 2012

Mark each arena with the last time something was collected from it and just skip sweeping old pages until X cycles have passed? Just thinking out loud... that check wouldn't save time for half-pages of old objects.

cwzwarich · on Aug 5, 2012

The problem with applying Appel's argument in practice is that it only shows that copying garbage collection is better if you are willing to waste RAM. In practice, if you had a 1 MB working set and 100 MB of RAM, you would probably want to use most of the extra 99 MB for caching even slower levels of your storage hierarchy.

Moving garbage collection does have some other potential benefits; it can provide guarantees regarding fragmentation, and it can improve spatial locality of data.

barrkel · on Aug 5, 2012

This is why generational GCs are a good idea. You can afford to waste RAM (proportionally) for early generations, because they're small. And for many use cases, this works out great; e.g. in server workloads, most of the allocations made during handling a request will die after the response has been sent. You can get great performance from having sufficient "waste" RAM for (say) 80th percentile total request allocation size times 80th percentile concurrent requests; but this is not normally a huge chunk of total heap size (which will usually be caches of various different kinds, which themselves benefit from a more manual deallocation strategy tied into cache expiration).

skybrian · on Aug 4, 2012

Copying is faster if the memory to be collected is mostly garbage. You only copy the good stuff and don't even look at the garbage. Many programs (particularly in scripting languages) allocate memory but only use it for a short time, so they create lots of garbage.

Often this is combined with generational garbage collection - new objects live for a short time in a temporary area, and then if they are still in use (not garbage), they get copied to longer-term storage.

jeffdavis · on Aug 4, 2012

Copying also means moving to new addresses. What strategies are used to update the referencing pointers to the new location?

to3m · on Aug 4, 2012

The pointers are updated as you copy them. (If you know the types ahead of time, you therefore structure objects so that they consist of runs of pointers and runs of non-pointers.)

Wikipedia has a reasonable explanation of the basic algorithm:

http://en.wikipedia.org/wiki/Cheneys_algorithm

It probably gets more complicated if you try to share it with a non-copying collector, but the basic algorithm is very straightforward to implement.

tomp · on Aug 4, 2012

in OCaml, copying GC is used only in the young generation. functional languages allocate lots of shortly lived objects, so copying gc is a good fit, since allocation is the cheapest possible, and as only a few objecta survive a gc, the cost of copying them is low. Since the GC is generational, objecta get copied at most n times, for some n.

riffraff · on Aug 4, 2012

the JVM (HotSpot) also uses this approach, the young generation is handled by a copying GC while the older one(s) use mark & sweep.

(to be fair, I believe this _was_ the default GC option, but it's not anymore, and there were at least 3 different young generation GC algorithms that used copying, so it's a bit more nuanced)

tikhonj · on Aug 4, 2012

As others have pointed out, a copying GC only needs to access live objects and does not need to go through the garbage.

Another thing to keep in mind is that a copying GC maintains some favorable memory properties. In particular, it reduces fragmentation each time it is run: since you're only copying over live objects, all your objects remaining objects get put next to each other in memory. This is especially useful if you have relatively few objects interspersed with lots of garbage.

maximilianburke · on Aug 4, 2012

Copying GC's may be slower to collect if you compare the speed of identifying live/dead objects versus object identification and copying of live objects but they have a significant improvement in the speed of allocation.

Copying GC's allow you to perform allocation by incrementing a pointer, whereas non-copying collectors force you to traverse all the free blocks in the heap looking for a suitable allocation spot, similar to what malloc(). As the heap fragments over time this cost becomes greater and greater.

Copying GCs can also improve memory locality during collection which a non-copying GC cannot do, giving a general performance improvement for the application.

davnola · on Aug 4, 2012

I recommend Elise Huard's recent talk on Ruby GC: http://skillsmatter.com/podcast/home/ruby-bin-men

(Also, I'd love to see the Hungarian Folk Dance interpretation of different GC approaches http://t.co/l8ADbEQR)

grimlck · on Aug 4, 2012

nice introduction, but it seems to be missing some really important concepts

- generational GC

- concurrent GC, the parts of GC which can be concurrent (which depends on your algorithm), and the tradeoffs needed to let your program run concurrently with the GC without stopping the world.

zem · on Aug 4, 2012

very nice article. given its likely target audience, though, he really needs to explain explicitly why mark-and-sweep can't compact as it goes, rather than leaving a fragmented heap.