JavaScript Garbage Collection with WebAssembly Is Possible Today

anderskaseorg · on April 22, 2022

Using a perpetual timer to poll anything is a waste of your users’ battery life. Even if the actual computation you’re doing seems insignificant, the repeated wakeups prevent the processor from reaching deeper sleep states.

https://wiki.ubuntu.com/Kernel/PowerManagement/IdentifyingIs...

eyelidlessness · on April 23, 2022

While a perfectly reasonable point, and one which might benefit other readers, I think it’s worth adding that the article seems fairly clear that it’s demonstrating a proof of concept that can be built upon. I certainly didn’t take it to suggest that I should go write a convoluted setInterval to implement GC in WASM.

Amusingly (to me), I’ve been working on a variety of performance-focused efforts—professionally on directly deliverable tasks, professionally on exploratory spikes for more holistic future improvements, and in pet personal projects. I find myself deliberately writing throwaway code that’s just as painfully suboptimal quite a lot. At worst it helps me quickly iterate on the actual bottleneck without focusing on the things I’m not trying to optimize.

It can be more helpful than that though, when it reveals sometimes unexpected JIT perf characteristics. Sometimes it reveals that a hot loop optimization isn’t going to bear much fruit because the JIT is going to optimize the baseline code better than anything I have in mind. Other times it reveals cases where I see dramatic benefits from an initial JIT pass that vanish because other factors cause a deoptimization.

Granted I think it’s still good to slap a big warning like this on obviously suboptimal code! But I also think people working on performance sensitive code in a dynamic JIT/GC environment should be encouraged to write weird suboptimal throwaway glue code around their perf focal point. You can learn a lot if you’re profiling!

siwatanejo · on April 23, 2022

One thing is lack of performance profiling, another thing is a design flaw (can this performance issue be really tackled or is it a limitation derived from the platform?). Which one are we looking at?

eyelidlessness · on April 23, 2022

We’re looking at neither. I even double checked the article and it’s very explicitly an example demonstrating very clearly a concept. It’s a proof of concept. It’s not a design so it won’t suffer from a flaw, it’s not expected to be profiled because it’s not a solution.

My comment was an aside that such code lacking design can aid profiling other code. Intentionally writing poorly performing code has benefits in understanding other performance details in some scenarios.

You didn’t seem to address this either, just reiterated a grievance with the article. I expect this is going to be a typical HN topic where strong opinions fly by, but I sincerely wonder if you might also benefit from slowing down and reading more closely?

krona · on April 23, 2022

A proof of concept 'is a realization of a certain method or idea in order to demonstrate its feasibility'.

The article presents an 'inventive step' (https://en.wikipedia.org/wiki/Inventive_step_and_non-obvious...) in the form of a POC which nobody should use, and so I don't know what it accomplishes, really.

+1 for C++ though.

tinus_hn · on April 24, 2022

I would hope most sensible browsers would simply stop any polling timers after a short while when the user is not interacting with the page.

Every website ‘needs’ to run stuff like that, there is no alternative to just reigning them in.

pcwalton · on April 23, 2022

How does this handle cycles? It seems to me that if JS holds on to a WebAssembly object, and that WebAssembly object holds on to another JS object that in turn has a reference to the same WebAssembly object, this solution will leak.

This may seem like a contrived scenario, but it happens a lot with callback patterns. You might have code like:

    // MyNativeButton is a native WebAssembly object
    // Perhaps it's part of a WebAssembly GUI framework that renders to canvas
    this.button = new MyNativeButton(); 
    this.button.onclick = () => this.clicked = true;

This will leak the "this" object forever because there's a cycle: "this" -> MyNativeButton -> JS closure -> "this".

brrrrrm · on April 23, 2022

The “WebAssembly object” doesn’t exist as you describe. You probably have an object that wraps WebAssembly function calls and primitive WebAssembly data (individual numbers).

WebAssembly functions can only return numbers and can only take numbers as arguments. Everything else is a wrapper that lives in JavaScript.

MyNativeButton works exactly as you’d expect. If the GC can clean it up, it might. Otherwise you have a leak that you’d have had in normal JavaScript.

eyelidlessness · on April 23, 2022

I’d assume the baseline advice would be the same as I’d give without WASM: if you’re treating `this` as mutable outside of a constructor you’re probably already screwed. Maybe it gets more complicated if you can’t see clearly whether your instance is a value type in WASM, but like your example shows… people generally don’t build value types with this kind of semantics in JS anyway. They’re already incurring those costs. My hope would be that WASM library authors are focused enough on resource usage that they’ll be more mindful of that than the baseline.

kevingadd · on April 23, 2022

mutability isn't really the problem here, even if 'this' is immutable the problem is that you closed over it with your event handler

eyelidlessness · on April 23, 2022

The only closed reference to `this` in the example is a mutation, in an event handler. This is supposed retain the reference and it’s not even a leak. The simplest way to solve that would be to remove the event handler, but even simpler would be to treat `this` as a value type and never change it after construction. The other reference is easily rewritten as a WeakMap key and would not retain a reference, but easier still could be written as a local binding with no reference to `this` at all.

sfink · on April 23, 2022

This is a bit odd. It's polling WeakRefs to simulate FinalizationRegistry. I don't know what problem the author had with FinalizationRegistry, but WeakRef and FinalizationRegistry use the same logic (and mostly the same code) to determine when the target has been collected. (Source: I reviewed the implementations of WeakRef and FinalizationRegistry in Firefox -- https://bugzilla.mozilla.org/show_bug.cgi?id=1587096 for one of the bugs.)

My guess is that it ran into the restrictions on collection visibility. If you look at the target at all (or a WeakRef to it) while running JS, then it will not do the finalization callback until JS code completes running even if the object is no longer reachable. (It's artificially kept alive until JS finishes running.) The example code handles this with the setTimeout call, which allows the JS code to return before polling all the WeakRefs.

It could have been a testing problem? Or maybe it's simpler: there wasn't enough allocation pressure to trigger a GC. JS objects won't get collected until there's a GC. (I guess that's kind of a tautology...)

Although allowed by the spec, it would be very surprising for an implementation to clear a WeakRef at a different time than it calls the FinalizationRegistry callback. (Or did we end up requiring it to be in the same tick? I can't remember. There's some restriction, maybe because the finalizer could check whether the WeakRef target is still there and you wouldn't want them to disagree?)

But in general, it's true that you can't rely on any particular timing of finalization. It'll happen eventually in all browsers if there is continued allocation. (I think all browsers might eventually do an idle GC as well?) Things can also be kept alive for weird internal reasons -- what if something gets JITted and has an IC (inline cache) that refers to an object you expect to be dead? The JS engine might keep that jit code alive for longer than you'd expect, even if it collects other things.

And of course this won't handle cycles. But it is exactly the thing that FinalizationRegistry made it into the language for, for better or worse.

pwdisswordfish9 · on April 23, 2022

> Although allowed by the spec, it would be very surprising for an implementation to clear a WeakRef at a different time than it calls the FinalizationRegistry callback. (Or did we end up requiring it to be in the same tick? I can't remember. There's some restriction, maybe because the finalizer could check whether the WeakRef target is still there and you wouldn't want them to disagree?)

The spec says that both clearing WeakRefs and scheduling finalization callbacks should happen at the same time: <https://tc39.es/ecma262/#sec-weakref-execution>. Of course, since that is merely ‘scheduling’, this means the callback can actually execute in some indefinite future, including never; the callback might as well be scheduled for AD 2101, with the runtime shut down long before then. It has to be done that way, because collection may happen in the middle of execution, but finalization callbacks cannot possibly interrupt it.

The bigger problem with the spec is that when an object becomes eligible for finalization in the first place is… not at all obvious: <https://github.com/tc39/ecma262/issues/2650>.

sfink · on April 23, 2022

Wow, that issue is kind of mindblowing. To me, it proves that making GC observable rips a large hole in the semantics. The damage was ameliorated somewhat with [[KeptAlive]], but the destruction is not contained in any understandable or bounded way, and we'll be bumping into it for years to come.

It feels like we managed to add something a lot like C and C++'s undefined behavior into JS. I thought it was just an issue of nondeterminism, but that issue proves that it could be a lot more problematic than that. I don't find the suggested optimizations particularly far-fetched, especially in a trace-compiled or otherwise specialized subset of the code. Something like inlining in a way that looks for all of the things provable locally even if they may not be true globally; it could totally do things like build alias sets of values and optimize down `w.deref() === obj` to `w.deref() !== undefined`.

brrrrrm · on April 23, 2022

> ran into the restrictions on collection visibility

Is there any more information about this? It seems really subtle!

pwdisswordfish9 · on April 23, 2022

Basically, looking at the target of a WeakRef makes it ineligible for collection until at least the next await, which the spec mandates by way of the [[KeptAlive]] list. This is meant to prevent TOCTOU bugs by foolish programmers doing this:

    if (weakref.deref()) {
        /* garbage collection here */
        use(weakref.deref());
    }

josefx · on April 23, 2022

Isn't that a rather narrow case? What happens on the following, is that protected?

if (weakref.deref()) { /* garbage collection here */ use(heavyComputations(), weakref.deref()); }

pwdisswordfish9 · on April 23, 2022

Yes, as long as there is no intervening suspension (await, or the script otherwise exiting to the event loop).

deathanatos · on April 23, 2022

… why is there an i31 type in that proposal?

djur · on April 23, 2022

"31 bits... is the widest range that can be guaranteed to be efficiently representable on all platforms."

This is because JS numbers are encoded as double floats, and the most efficient representation of an integer in v8 is a "SMI (small integer)", a 31-bit signed integer: https://medium.com/fhinkel/v8-internals-how-small-is-a-small...

progval · on April 23, 2022

I don't know about this proposal in particular, but it is common for garbage-collected languages to use the highest bit to distinguish pointers and integers; which means integers cannot use it.

0x20cowboy · on April 23, 2022

I like webassembly mostly because it doesn’t have garbage collection.

I know the article wasn’t calling for anything like this, but Please keep wasm GC free. Golang, for example, compiles in its own GC (which is why it’s wasm is so large).

You always have to be careful of memory issues when mixing languages - see using Lua or mixing C and Go.

I’d rather deal with memory leaks than fighting garbage collection. If you want GC, maybe just stay in Javascript?

sfvisser · on April 23, 2022

Don't think your comment makes a lot of sense in the context of the article. Author explicitly wants to create/use/drop some WASM instances in the JavaScript world without leaking memory. Piggybacking on the existing JS GC is perfectly reasonable approach here.

0x20cowboy · on April 23, 2022

Well, I said my comment wasn't super related to the article.

I suppose my somewhat hidden comment was that if one wants GC you should build it into the wasm side like golang does. I don't think creating something in one system, and trying to memory manage it in another system is good idea in the first place - at least not something you should do on a real system.

I think playing around with stuff is fun, and I didn't want to come across negative because while I think it's a bad idea, I love playing with bad ideas.

k__ · on April 23, 2022

As far as I understood the WASM people, they will allow the use of the host GC, so in the future you don't have to bring one yourself.

ffhhj · on April 23, 2022

> All 50 created MyClass objects go out of scope and will be cleaned up soon after the above code executes.

It would be nicer to put them in an object pool.