Why You Should Be Excited About Garbage Collection in Ruby 2.0

cpr · on March 24, 2012

Gosh, I hate to be that "old guy," but using bitmaps for mark & sweep GC dates from the early 70's.

In particular, in GC implementations for Lisp and Lisp-like languages (such as the ECL implementation we used at Harvard in the mid-70's) where CONS cells were two eighteen-bit halfwords (CAR and CDR) fitting in a 36-bit word, there was no place for mark bits, so you'd use a bitmap instead (as rediscovered here).

And we used the same technique for finding the page headers (e.g., containing the mark bitmaps) for each part of the heap, aligning to a larger power of two so you could chop any pointer down by and'ing off the lower bits to get to the page header.

There really ain't much new under the sun. Too bad every generation has to rediscover all this stuff.

silentbicycle · on March 24, 2012

Indeed - information about good garbage collectors is readily available. Paul R. Wilson's "Uniprocessor Garbage Collection Techniques" (http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.47.2...) is an excellent survey, and Jones, Hosking, and Moss's _The Garbage Collection Handbook_ is an excellent reference. (The previous book by Jones and Lins is also great, but the new one adds important material on concurrent GCs, real-time GC, and interactions between the GC and runtime systems.)

As an exercise, I wrote a simple standalone mark-bit-page & lazy sweeping garbage collector for C. (https://github.com/silentbicycle/oscar) Marking occurs in a user-provided callback, since I don't assume any particular data structure. It's only ~300 LOC.

gruseom · on March 24, 2012

That's a great name for a garbage collector :)

Fluxx · on March 24, 2012

> Too bad every generation has to rediscover all this stuff.

Even though this technique existed 30 years ago, doesn't mean that it's been recently "rediscovered" by the Ruby folks and excitedly implemented to the musings of, "how did we miss this technique?!?" There are likely lots of reasons why this wasn't used before. The abstract of their paper seems to shed some light on this:

[...]But Ruby interpreter runs on various platforms, and we do not have portable memory allocation API to obtain aligned memory region without wasting region. In this paper, we propose portable scheme to map from object pointers to corresponding bitmap table in constant time.[...]

I may be totally wrong, and the paper is entirely in Japanese, but I think there is more to it.

joblessjunkie · on March 24, 2012

I don't think this is rediscovery. These things just take time to implement. Ruby hasn't been around all that long.

true_religion · on March 24, 2012

Ruby has been around since 1995. That's over 15 years.

Implementing bitmaps for mark-and-sweep a project well within the skills of a final year undergrad, or first year graduate student of CS.

You can say many things, but I doubt they didn't implement this because they didn't have time, or didn't have the skills to do so. They merely prioritized against it from day-one.

kbd · on March 24, 2012

Ruby is almost 20 years old...

subwindow · on March 24, 2012

Not really excited. Sorry. To me this is an incremental improvement to a method of GC that is just not going to cut it.

Rubinius uses Generational GC. There are definitely some parts of Rubinius that are (currently) slower than MRI, but memory management stands to be one of places where Rubinius will demolish MRI in the long-run.

JRuby/JVM uses (from what I understand) a Generational-first hybrid GC. There's been so much work and analysis done on the JVM that there's no way in hell any mark and sweep GC is going to compete.

Honestly if MRI is going to remain competitive it needs to completely rethink the way it does GC.

bluemoon · on March 24, 2012

Jruby seems like it has the most potential in the GC department. It seems like most of the interesting GC papers I've seen are work that's been done on the JVM.

jacaetevha · on March 24, 2012

And (IIRC) much of the JVM's GC and VM work came directly from the Smalltalk world and the Strongtalk guys.

Uchikoma · on March 24, 2012

No.JVM GCs (JRockit, Hotspot, IBM) have developed much further than what Smalltalk did. The G1 GC is especially nice.

I found

http://blog.dynatrace.com/2011/05/11/how-garbage-collection-...

a very interesting - and understandable - article.

agumonkey · on March 24, 2012

I heard of SELF too.

jbellis · on March 24, 2012

Wake me up when they add compaction.

http://en.wikipedia.org/wiki/Fragmentation_(computing)#Memor...

(CPython has this weakness too. http://www.slideshare.net/jbellis/pycon-2012-what-java-can-l...)

pkulak · on March 24, 2012

I was expecting a lot more. It's still just mark and sweep, which is old news.

ajasmin · on March 24, 2012

How do other garbage collectors handle this?

Can I fork a Java or V8 process and take advantage of the copy-on-write optimization?

oconnore · on March 24, 2012

Most [1] modern GC implementations use generational copying collectors. A copying GC does not need to maintain a bit for each allocated object, and therefore will not break copy-on-write.

They are also much faster for many common loads.

[1] I suppose this is debatable. Oddly, many modern scripting languages use ancient compiler technology for no discernible reason. See also: the GIL.

masklinn · on March 24, 2012

> Oddly, many modern scripting languages use ancient compiler technology

Historical: most of them are 10~20 years old, "new compiler technologies" had not trickled down much when they were created and they had no real need for the things those provided (not to mention they have drawback, a GIL has better single-threaded performances than fine-locking which is important when single-treaded is your primary workload).

Also manpower, probably, it's harder to implement a generational concurrent GC than a refcounting or a basic M&S, and it's even harder to retrofit that into an existing codebase (hence pypy having pluggable GC backends and defaulting to a hybrid GC)

bickfordb · on March 24, 2012

Java and Haskell (GHC) have pretty good single thread performance and do not have a GIL.

ootachi · on March 24, 2012

And they both had to spend a lot of effort to get concurrent GC to work efficiently. The biggest downside of concurrent GC is that you need read barriers, not just write barriers as in single-threaded incremental garbage collection.

ootachi · on March 24, 2012

I have a theory, which has so far held up [1], that every remotely popular language that needs automatic storage reclamation ends up growing a generational and incremental/concurrent collector, either in its main branch or in a fork [2]. It's just a matter of time. Efficient automatic storage reclamation is hard.

[1] Language developers sometimes claim they don't need it. I believe they're on the wrong side of history.

[2] Forks are common for this, and for good reason; often times languages that start with reference counting end up taking large dependencies on external libraries that manipulate reference counts (like Python with its Py_INCREF and Py_DECREF), at which point the language can't fully move to tracing GC without breaking its ecosystem.

FooBarWidget · on March 24, 2012

What? A copying GC copies objects and therefore inherently breaks copy-on-write.

radiowave · on March 24, 2012

Yes, but generally if you're forking an existing process, then maybe 90%+ of the objects have already been tenured into the old space. It's only young or newly created objects that are subsequently found to be long lived that will need to be copied.

FooBarWidget · on March 24, 2012

In other words, it's only copy on write friendly if the objects have been moved to an area that isn't garbage collected by a copying collector...

ajasmin · on March 24, 2012

Can't we get into problems with heap compaction though? It seems that modern GCs try to keep the heap tidy by moving objects around.

oconnore · on March 24, 2012

What sort of problems? You lose the ability to do pointer arithmetic on native objects, since they can change address on any instruction boundary.

ajasmin · on March 24, 2012

I mean problems with copy-on-write. By moving objects around won't we end up copying the whole address space?

Then again we may not get into this problem until there's a full GC which would be quite rare.

ootachi · on March 24, 2012

The standard state-of-the-art for garbage collection is two-space copying for the nursery, and mark-and-sweep for the tenured generation. That way you avoid copying the whole heap on every GC; you only get copying for the nursery. You may have a compaction pass for the tenured generation that runs very rarely (much more rarely than a regular minor or major GC). V8, Java, and OCaml all follow this model, and I believe .NET and Haskell do as well. Lua skips the generational part for simplicity's sake (as games care a lot more about latency than throughput); SpiderMonkey does too because Mozilla hasn't implemented generational GC yet.

If fork() performance is important to you, you probably don't want compaction for the tenured generation at all. That's totally fine; while fragmentation will accumulate over time, it's no worse than C++. You'll still have copying for the nursery, but the nursery fills up and gets recycled so often that the COW optimization isn't really helping you there anyhow.

silentbicycle · on March 24, 2012

FWIW, Lua 5.2 just added a generational garbage collector.

ootachi · on March 24, 2012

Another point in favor of my theory ;)

riffraff · on March 24, 2012

Narihiro Nakamura did a lot of hacking on the ruby GC ("longlife" used in twitter's Kiji, parallel marking, lazy sweeping, bitmap marking).

It's nice to see more of it ending up in the ruby mainline.

_r9fo · on March 24, 2012

it took 3+ years to develop this technique? it seemed like the most straightforward approach to the problem.

silentbicycle · on March 24, 2012

Second most straightforward, perhaps. They previously had the bit for marking inside objects, rather than on a separate memory page of just mark bits, which interacts poorly with copy-on-write.

xenophanes · on March 24, 2012

So what's REE's solution and how does it differ from this approach?

FooBarWidget · on March 24, 2012

REE developer here. REE's solution is almost the same. Except we don't align heaps; we identify the containing heap using binary search instead.

verelo · on March 24, 2012

"Excited about Ruby" is something that I cant say I've ever been. Still...

on March 24, 2012

[deleted]

nonrecursive · on March 24, 2012

I'm pretty sure you got downvoted for writing something that didn't add anything to the conversation.

"Oh, verelo's never been excited about Ruby? Well that changes everything! I'm sure glad I know his personal opinion, offered without any commentary related to the content of the article or even off-topic supporting reasons!"

verelo · on March 24, 2012

Fair enough, i honestly just a little tired of justifying why ruby has never excited me (not so much on here but in real life)

I find other languages, that I'm already a lot more familiar with, that have very well documented functionality, offer everything i need.

My main issues have been installation headaches and deprecated calls in other peoples code (without an easy swap in and out of a new function). That's really just the start of it but its enough for me to have lost interest until I see some significant changes to the language, more significant than this change.

kingnothing · on March 24, 2012

You would have the same problem with deprecations in any open source projects that are either not maintained or you can't update for whatever internal reasons. The language has nothing to do with that. As for installation issues, rvm solved that for me long ago.

verelo · on March 24, 2012

RVM does help, deprecations...i dont know though. I have found that the language has gone through a lot of changes in a short period of time, which is great, but it hasnt been very convenient for apps that have not been very closely maintained.