I don't see how a Copying GC can be faster than Mark and Sweep. Transferring a b...

waterhouse · on Aug 4, 2012

Suppose that the program only needs about 1 MB of objects, but it will fill up 100 MB before performing a copying GC. In mark and sweep, you need to perform an operation for every garbage object (of which there are 99 MB), whereas with copying GC, you need only perform operations for every live object (of which there is 1 MB). Even if copying an object takes ten times as long as returning a chunk of memory to a free-list, the copying algorithm is faster. (Also, popping a free cell off a free-list is more expensive than incrementing an allocation pointer.) Of course, either way, the total work done for allocating objects and freeing them is O([number of objects allocated]).

Something like this is described in detail here: "Garbage Collection Can Be Faster Than Stack Allocation", Andrew Appel: http://www.cs.princeton.edu/~appel/papers/45.ps

silentbicycle · on Aug 4, 2012

Not necessarily -- lazy sweeping only needs to do work proportional to live data (like copying GC), rather than proportional to dead data.

I have a pretty straightforward implementation of a lazy mark-sweep collector here (in C): https://github.com/silentbicycle/oscar

Basically, instead of marking live data, then sweeping all unmarked data, you just mark live data, then the next time a slot is requested, you step the allocator over the heap until it finds the next unmarked cell and re-use that. If most cells are dead, this happens very quickly. If you have to sweep over too many live cells (left deliberately vague), you grow the heap.

waterhouse · on Aug 4, 2012

Interesting... I still feel that this is for the most part merely deferring the free-ing work to when you next allocate an object. It may succeed in getting the "free" cost down to a single instruction--checking a mark bit--but it is still a cost per dead object that a copying GC wouldn't have to pay. (The difference may be, as some say, "academic".) It does have the slight advantage of reducing the maximum pause time due to sweeping down to being proportional to live objects rather than garbage.

Also, what if you want to allocate a large object? You might have to skip over some unmarked cells. Either you'd waste the memory, or you would probably want to put them on a free-list of some sort, in which case you are doing more than just bumping a pointer and checking for a mark. ... I see, it says "equally-sized chunks of memory".

Still, that's a cool idea. As mark-and-sweep goes, anyway. :-} I'm rather more fond of copying garbage collectors--I like the determinism, the fact that (normally) the memory address of each object after a full copying GC will depend only on the structure of the graph of objects, and not on the memory addresses of the objects prior to the GC. (In other words, it destroys the information "what the memory address of each object was before this GC". This implies a somewhat higher inherent cost, to eliminate this entropy. See [1] for musings about the relation of this to thermodynamics.) In particular, whether you have fragmentation issues won't depend on the past history of your program. (You could say that the space lost to fragmentation is fixed at a factor of 2.)

[1] http://www.pipeline.com/~hbaker1/ThermoGC.html

silentbicycle · on Aug 5, 2012

Checking the mark bit can be quite cheap, though, particularly if you store the mark bits in their own array - the next several bits will be in-cache. Still, the overall performance is likely to vary quite a bit for either scheme depending on the project's other quirks. Mark-sweep has the advantage that it doesn't need to move objects, but copying is easier with variable-size allocations, etc.

Another disadvantage with lazy sweeping is I don't see a clear way to make it generational. (Right?)

crumblan · on Aug 5, 2012

Mark each arena with the last time something was collected from it and just skip sweeping old pages until X cycles have passed? Just thinking out loud... that check wouldn't save time for half-pages of old objects.

cwzwarich · on Aug 5, 2012

The problem with applying Appel's argument in practice is that it only shows that copying garbage collection is better if you are willing to waste RAM. In practice, if you had a 1 MB working set and 100 MB of RAM, you would probably want to use most of the extra 99 MB for caching even slower levels of your storage hierarchy.

Moving garbage collection does have some other potential benefits; it can provide guarantees regarding fragmentation, and it can improve spatial locality of data.

barrkel · on Aug 5, 2012

This is why generational GCs are a good idea. You can afford to waste RAM (proportionally) for early generations, because they're small. And for many use cases, this works out great; e.g. in server workloads, most of the allocations made during handling a request will die after the response has been sent. You can get great performance from having sufficient "waste" RAM for (say) 80th percentile total request allocation size times 80th percentile concurrent requests; but this is not normally a huge chunk of total heap size (which will usually be caches of various different kinds, which themselves benefit from a more manual deallocation strategy tied into cache expiration).

skybrian · on Aug 4, 2012

Copying is faster if the memory to be collected is mostly garbage. You only copy the good stuff and don't even look at the garbage. Many programs (particularly in scripting languages) allocate memory but only use it for a short time, so they create lots of garbage.

Often this is combined with generational garbage collection - new objects live for a short time in a temporary area, and then if they are still in use (not garbage), they get copied to longer-term storage.

jeffdavis · on Aug 4, 2012

Copying also means moving to new addresses. What strategies are used to update the referencing pointers to the new location?

to3m · on Aug 4, 2012

The pointers are updated as you copy them. (If you know the types ahead of time, you therefore structure objects so that they consist of runs of pointers and runs of non-pointers.)

Wikipedia has a reasonable explanation of the basic algorithm:

http://en.wikipedia.org/wiki/Cheneys_algorithm

It probably gets more complicated if you try to share it with a non-copying collector, but the basic algorithm is very straightforward to implement.

tomp · on Aug 4, 2012

in OCaml, copying GC is used only in the young generation. functional languages allocate lots of shortly lived objects, so copying gc is a good fit, since allocation is the cheapest possible, and as only a few objecta survive a gc, the cost of copying them is low. Since the GC is generational, objecta get copied at most n times, for some n.

riffraff · on Aug 4, 2012

the JVM (HotSpot) also uses this approach, the young generation is handled by a copying GC while the older one(s) use mark & sweep.

(to be fair, I believe this _was_ the default GC option, but it's not anymore, and there were at least 3 different young generation GC algorithms that used copying, so it's a bit more nuanced)

tikhonj · on Aug 4, 2012

As others have pointed out, a copying GC only needs to access live objects and does not need to go through the garbage.

Another thing to keep in mind is that a copying GC maintains some favorable memory properties. In particular, it reduces fragmentation each time it is run: since you're only copying over live objects, all your objects remaining objects get put next to each other in memory. This is especially useful if you have relatively few objects interspersed with lots of garbage.

maximilianburke · on Aug 4, 2012

Copying GC's may be slower to collect if you compare the speed of identifying live/dead objects versus object identification and copying of live objects but they have a significant improvement in the speed of allocation.

Copying GC's allow you to perform allocation by incrementing a pointer, whereas non-copying collectors force you to traverse all the free blocks in the heap looking for a suitable allocation spot, similar to what malloc(). As the heap fragments over time this cost becomes greater and greater.

Copying GCs can also improve memory locality during collection which a non-copying GC cannot do, giving a general performance improvement for the application.