The design space has one non-obvious but fundamental strict dividing line: if yo...

ridiculous_fish · on Jan 25, 2018

Aside from the stack, another key challenge for moving GCs is hash tables keyed on object identity. If the object can move the raw address is no longer a suitable hash.

.NET at one point stored an extra per-object word, which was either (via its LSB) a random hash code or a pointer to a metadata object that held the lock, etc.

Python did this cute thing where moved objects would get an extra word allocated to store the moved-from address, which was where the hash was stored.

Apple's plan for its (not-released) ObjC moving GC was to directly teach the GC about hash tables, so it could rehash on collection (!).

Do you know how other implementations handle this?

BEAM's GC is indeed a marvel both because of the immutability and also the enforced process isolation, so that each process can do its own GC independently. As you say, constrain the mutator...

dfox · on Jan 25, 2018

Cpython has non-moving GC so this is not an issue. On the other hand CPython's hashmap implementation is probably most educational open source hashmap implementation that you can find, because it is full of wonderful and portable performance hacks (for one thing the hash values of CPython's objects is intentionally computed such that the distribution is non-uniform which allows tuning of the hashmap implementation for common cases).

As for having tight coupling between moving GC and hashmap implementation there is another reason why you want to do that: weak pointers and hashmaps that are either weak keyed or weak valued which both are things that are useful for the VM implementation itself (symbol tables, method dispatch caches, identity maps for FFI...).

mrgriffin · on Jan 25, 2018

> Cpython has non-moving GC so this is not an issue.

I believe the GP was referring to PyPy.

From https://morepypy.blogspot.co.uk/2009/10/gc-improvements.html

> The hash field is not necessarily there; it is only present in classes whose hash is ever taken in the RPython program (which includes being keys in a dictionary). It is an "identity hash": it works like object.__hash__() in Python, but it cannot just be the address of the object in case of a GC that moves objects around.

ramchip · on Jan 25, 2018

The BEAM GC algorithm is explained in details here:

https://www.erlang-solutions.com/blog/erlang-garbage-collect...

I think it’s moving, unless I misunderstood something?

dfox · on Jan 25, 2018

That is why the "AFAIK" was there :)

Few years ago I read some paper from Ericsson that described non-moving generational GC explicitly designed for Erlang's data model which could even be implemented in terms of malloc()/free(), I'm not sure that it was relevant to how it is implemented in current BEAM.

ramchip · on Jan 25, 2018

That’s quite possible. The GC has changed several times. There was even a unified heap fork of Erlang at some point: https://www.researchgate.net/profile/Marc_Feeley/publication...