> GC != RC. I am aware. > But that’s not what GC wants, unless (as I said before...

pizlonator · 2024-07-09T17:20:36 1720545636

> I can't figure out exactly what you mean here. Temporary pins on the stack should be fine, and I don't see why anything else would be necessary.

The browser has a whole native heap (i.e. C++ objects, in current impls) that participates in the JS GC heap as follows:

- If the C++ object is referenced from C++ or from JS, it must be kept alive.

- If the C++ object references a JS object, then the JS object must be kept alive, so long as the C++ object would have been alive per the previous rule.

- It's possible for an object reference chain like JSobject->C++object->C++object->JSobject, and let's assume there are no other pointers to the C++ objects, in which case the last JS object should kept alive by GC only if the first JS object is alive.

- It's possible for dead reference cycles to exist like JSobject<->C++object, in which case both should die.

This requires that C++ has the ability to place references to JS objects in C++ fields.

This is where pinning comes in. It would be quite simple (and memory safe) but also totally incorrect (i.e. massive memory leak) to say that if a C++ field points to a JS object, then the GC just sees that field as a root. This is what I mean by pinning. (Note that "pinning" has many meanings in GC jargon; I'm using the Hermes version of the jargon. I guess I could have said "strong root" or something, but that's weirder to say.) This would be wrong, since it would not allow us to collect the dead cycle at all. Dead cycles are a common case in the browser. It would also cause other subtle breakage.

So, what the browser does instead is to have the C++ heap participate in GC: every C++ object that could possibly store a reference to JS objects anywhere can respond to GC callbacks asking it to account for all of those references. And, every C++ object needs to have a story for being referenced exclusively from JS, exclusively from C++, or a combo of the two. And the C++ code needs to be able to participate in whatever barrier discipline is necessary to get generations or incrementality or concurrency that the JS heap wants.

There are different ways to do this. Blink's Oilpan is probably the most principled, and that's basically a whole GC-for-C++ framework - very complex stuff. So, tons of inherently not-memory-safe code on the browser side just so it can do business with the JS heap.

Dylan16807 · 2024-07-09T18:51:24 1720551084

I'm trying to figure out where our understandings differ, since that generally sounds familiar and reasonable to me. I guess you're assuming the callback/accounting code needs to be unsafe or able to violate the GC's preconditions? But I don't see why you assume that. As part of the GC, build some data structures that can handle that accounting and present a safe API, then use those data structures anywhere you don't want to create a root. When the browser code accesses the contents, barriers can be applied automatically.

The hard part of using a data structure like that is giving it ownership of the data and control over destroying it, but Rc does that too, doesn't it? That kind of thing is why I mentioned Rc. The difference between Rc and GC is much more in the behind-the-scenes tracking than in the API it gives.

pizlonator · 2024-07-09T19:40:23 1720554023

> I'm trying to figure out where our understandings differ, since that generally sounds familiar and reasonable to me. I guess you're assuming the callback/accounting code needs to be unsafe or able to violate the GC's preconditions? But I don't see why you assume that. As part of the GC, build some data structures that can handle that accounting and present a safe API, then use those data structures anywhere you don't want to create a root. When the browser code accesses the contents, barriers can be applied automatically.

If this code was all written in Rust, I could imagine there being almost no uses of `unsafe`, except for one: the thing where the GC decides to delete an object.

But this means that all of that code that isn't marked `unsafe`, but instructs the GC about what objects to mark or not, is really super unsafe because if it makes a logic error in telling the GC what to mark then the GC will delete an object that it should not have deleted.

So, the problem here isn't that you can't wrap the unsafe stuff in a safe API. The problem is that even if you do that, all of your seemingly-safe code is really super unsafe.

> The hard part of using a data structure like that is giving it ownership of the data and control over destroying it, but Rc does that too, doesn't it? That kind of thing is why I mentioned Rc. The difference between Rc and GC is much more in the behind-the-scenes tracking than in the API it gives.

The difference between RC and GC is exactly in the fact that the API they give is radically different. And that the behind-the-scenes tracking is different, too.

It's totally valid to tell RC "I want to point at this object so keep it alive".

But that's not the API that the GC will give you, unless it's a pinning API. The API where you tell the GC "keep this alive" will prevent the deletion of the garbage cycles I mentioned earlier.

So, the GC (in the case of something like a browser) gives a different API: one where user code either marks something, or not, at its discretion. There's no easy way to make that safe.

Dylan16807 · 2024-07-09T20:08:20 1720555700

> But this means that all of that code that isn't marked `unsafe`, but instructs the GC about what objects to mark or not, is really super unsafe because if it makes a logic error in telling the GC what to mark then the GC will delete an object that it should not have deleted.

By putting it in a GC structure, it would be giving ownership to the GC, and would borrow it back to use. So whenever it's not borrowed, it's safe for the GC to delete it. And the GC won't delete anything outside of its control.

If you have a logic error and prematurely delete, then attempting to borrow the object will return None, which is still safe.

> It's totally valid to tell RC "I want to point at this object so keep it alive".

> But that's not the API that the GC will give you, unless it's a pinning API. The API where you tell the GC "keep this alive" will prevent the deletion of the garbage cycles I mentioned earlier.

The idea is that any pointing/pinning you do while manipulating an object would be a borrow on the stack. As soon as the function returns, there's no pinning.

So there's a few things that exist in this system:

* A GC-object references another GC-object, keeping the target alive while it is alive. A GC-object can be created by either Rust or JS. The loop you describe would be made up of GC-objects, so it would be straightforward to collect.

* A non-GC object has a permanent pinning reference to a GC-object. These are rare and purposeful.

* (Optional) A non-GC object has a weak reference to a GC-object, which can be collected at any time.

* Rust code has a temporary pinning reference to a GC-object on the stack, while traversing/manipulating it, and it goes away as soon as the function exits. It won't pin too long because the release is automatic. It won't free prematurely because the GC code knows the borrow is happening.