Moving collectors get you best allocation throughput but impose other costs, whi...

dfox · on Jan 24, 2018

The design space has one non-obvious but fundamental strict dividing line: if you can constrain the mutator code enough to be able to insert write barriers for generation GC you also can constrain it enough to have all the metadata to support moving GC at least to the extent of opportunistic compaction (eg. what CLR and SBCL on i386 does, both of which have conservationaly scanned stack because building stack map for register-constrained platforms like i386 is essentially impossible).

On the other hand BEAM has AFAIK non-moving generational GC implemented in the aforementioned way which is in this case trivial as you don't need any write barriers and remembered set when heap objects are inherently immutable. (In this it is somewhat relevant that Boehm GC for some time had API that allowed you to signal the time extent of mutatibility of given heap object to the GC, AFAIK it is no-op since some 6.x version and the concurrent/incremental/generational bits of it work on basis of mprotect() and handling SIGSEGV)

ridiculous_fish · on Jan 25, 2018

Aside from the stack, another key challenge for moving GCs is hash tables keyed on object identity. If the object can move the raw address is no longer a suitable hash.

.NET at one point stored an extra per-object word, which was either (via its LSB) a random hash code or a pointer to a metadata object that held the lock, etc.

Python did this cute thing where moved objects would get an extra word allocated to store the moved-from address, which was where the hash was stored.

Apple's plan for its (not-released) ObjC moving GC was to directly teach the GC about hash tables, so it could rehash on collection (!).

Do you know how other implementations handle this?

BEAM's GC is indeed a marvel both because of the immutability and also the enforced process isolation, so that each process can do its own GC independently. As you say, constrain the mutator...

dfox · on Jan 25, 2018

Cpython has non-moving GC so this is not an issue. On the other hand CPython's hashmap implementation is probably most educational open source hashmap implementation that you can find, because it is full of wonderful and portable performance hacks (for one thing the hash values of CPython's objects is intentionally computed such that the distribution is non-uniform which allows tuning of the hashmap implementation for common cases).

As for having tight coupling between moving GC and hashmap implementation there is another reason why you want to do that: weak pointers and hashmaps that are either weak keyed or weak valued which both are things that are useful for the VM implementation itself (symbol tables, method dispatch caches, identity maps for FFI...).

mrgriffin · on Jan 25, 2018

> Cpython has non-moving GC so this is not an issue.

I believe the GP was referring to PyPy.

From https://morepypy.blogspot.co.uk/2009/10/gc-improvements.html

> The hash field is not necessarily there; it is only present in classes whose hash is ever taken in the RPython program (which includes being keys in a dictionary). It is an "identity hash": it works like object.__hash__() in Python, but it cannot just be the address of the object in case of a GC that moves objects around.

ramchip · on Jan 25, 2018

The BEAM GC algorithm is explained in details here:

https://www.erlang-solutions.com/blog/erlang-garbage-collect...

I think it’s moving, unless I misunderstood something?

dfox · on Jan 25, 2018

That is why the "AFAIK" was there :)

Few years ago I read some paper from Ericsson that described non-moving generational GC explicitly designed for Erlang's data model which could even be implemented in terms of malloc()/free(), I'm not sure that it was relevant to how it is implemented in current BEAM.

ramchip · on Jan 25, 2018

That’s quite possible. The GC has changed several times. There was even a unified heap fork of Erlang at some point: https://www.researchgate.net/profile/Marc_Feeley/publication...

vvanders · on Jan 25, 2018

> of which there's a limit (65k in Android!)

Oh man JNI and Android.

I've never heard a developer curse up a storm like I did when one of my co-workers inadvertently stumbled across the 512 LocalRef limit(also a fun one) during an intermittent crash repro.

By the time he got done with his rant we had to talk him out of purchasing a one-way plane ticket to Mountain View.

pjmlp · on Jan 25, 2018

As Java dev, there are so many things that just feel wrong on Android.

Leaving aside the fact how Google treated Sun, the whole framework has a feeling that was written by former C and C++ devs, learning Java on the job while implementing Android.

WillReplyfFood · on Jan 25, 2018

Sorry, but if you have constrained plattforms- all the frameworks/software tends to look this way. This is how many gamecodes look internal- and do horrible things behind the scene to keep it save.

So if a native java dev had written the whole plattform, how would he handle the limitations of the plattform any diffrently, except for alot of abstract wrapping and exceptional execptions?

There is a reason why there is no rte every time something comes to the limits of machine feasible. Projecting ones unwillingness to cope with reality on a developer with a difficult job - is sort(off, sad);

pjmlp · on Jan 25, 2018

That wasn't my point about feeling wrong about Android APIs, surely there are some restrictions due to programming resource constrained devices.

Any Java developer that used Java Card, J2ME or Embedded Java is quite aware of them.

A native Java dev would not have used Hungarian notation, snake case identifiers, allocation of classes just to fake out parameters which could be returned as result,have event handlers with unused parameters repeating AWT design errors and a couple of other things that I could still rant on.

xxs · on Jan 25, 2018

I've mentioned it earlier, yet - if you need native code use direct byte buffers as shared memory between java/c and hold/use no refs inside the native code.

pjmlp · on Jan 25, 2018

Those lucky enough to focus on Oreo can make use of the new shared memory classes, that they introduced to the new microkernel-like architecture for Treble drivers.

pjmlp · on Jan 25, 2018

Objective-C GC main failure was that is impossible to have any good GC implementation in a language with C semantics.

On top of that, having developers mixing GC compiled libraries with others compiled the traditional was a good recipe for random runtime crashes.

xxs · on Jan 25, 2018

A point of JNI - use direct byte buffers. The address remains the same within the virtual address of the process (hence non-movable).

Of course that leaves some memory management in Java, itself, but the price is usually fine for having void* access in C.