Hacker News new | past | comments | ask | show | jobs | submit login

The design space has one non-obvious but fundamental strict dividing line: if you can constrain the mutator code enough to be able to insert write barriers for generation GC you also can constrain it enough to have all the metadata to support moving GC at least to the extent of opportunistic compaction (eg. what CLR and SBCL on i386 does, both of which have conservationaly scanned stack because building stack map for register-constrained platforms like i386 is essentially impossible).

On the other hand BEAM has AFAIK non-moving generational GC implemented in the aforementioned way which is in this case trivial as you don't need any write barriers and remembered set when heap objects are inherently immutable. (In this it is somewhat relevant that Boehm GC for some time had API that allowed you to signal the time extent of mutatibility of given heap object to the GC, AFAIK it is no-op since some 6.x version and the concurrent/incremental/generational bits of it work on basis of mprotect() and handling SIGSEGV)




Aside from the stack, another key challenge for moving GCs is hash tables keyed on object identity. If the object can move the raw address is no longer a suitable hash.

.NET at one point stored an extra per-object word, which was either (via its LSB) a random hash code or a pointer to a metadata object that held the lock, etc.

Python did this cute thing where moved objects would get an extra word allocated to store the moved-from address, which was where the hash was stored.

Apple's plan for its (not-released) ObjC moving GC was to directly teach the GC about hash tables, so it could rehash on collection (!).

Do you know how other implementations handle this?

BEAM's GC is indeed a marvel both because of the immutability and also the enforced process isolation, so that each process can do its own GC independently. As you say, constrain the mutator...


Cpython has non-moving GC so this is not an issue. On the other hand CPython's hashmap implementation is probably most educational open source hashmap implementation that you can find, because it is full of wonderful and portable performance hacks (for one thing the hash values of CPython's objects is intentionally computed such that the distribution is non-uniform which allows tuning of the hashmap implementation for common cases).

As for having tight coupling between moving GC and hashmap implementation there is another reason why you want to do that: weak pointers and hashmaps that are either weak keyed or weak valued which both are things that are useful for the VM implementation itself (symbol tables, method dispatch caches, identity maps for FFI...).


> Cpython has non-moving GC so this is not an issue.

I believe the GP was referring to PyPy.

From https://morepypy.blogspot.co.uk/2009/10/gc-improvements.html

> The hash field is not necessarily there; it is only present in classes whose hash is ever taken in the RPython program (which includes being keys in a dictionary). It is an "identity hash": it works like object.__hash__() in Python, but it cannot just be the address of the object in case of a GC that moves objects around.


The BEAM GC algorithm is explained in details here:

https://www.erlang-solutions.com/blog/erlang-garbage-collect...

I think it’s moving, unless I misunderstood something?


That is why the "AFAIK" was there :)

Few years ago I read some paper from Ericsson that described non-moving generational GC explicitly designed for Erlang's data model which could even be implemented in terms of malloc()/free(), I'm not sure that it was relevant to how it is implemented in current BEAM.


That’s quite possible. The GC has changed several times. There was even a unified heap fork of Erlang at some point: https://www.researchgate.net/profile/Marc_Feeley/publication...




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: