function test(text) {
const a = 1
function inner() {
console.log(eval(text))
}
inner()
}
test("() => { return a }")
prints 1 to the console
This happens because the closure's context object is shared between all closures in a given scope. So as soon as one variable from a give scope is accessed through a closure then all variables will be retained by all inner functions.
Technically the engines could be optimizing it when no eval used is detected or when in strict mode (which blocks eval), but I guess that dynamically dropping values from a closure context based on inner lexical scopes can be really tricky thing to do and probably not worth the overhead.
[later edit: I re-read the example code, and the issue is a double capture and the author not really understanding the semantics of capture in JS, I've given a more technical breakdown and explanation at https://news.ycombinator.com/item?id=41113481]
Nah, in the es3.1 or 5 era we (tc39) fixed the semantics of eval to make it explicit that scope capture only occurs in a direct/unqualified eval (essentially eval becomes a pseudo keyword).
The big question is scope capture - in JSC I implemented free vs captured variable analysis many many years ago, though the primary reason was to be able to avoid allocating the activation/scope object for a function at all, though that did allow us to handle this case (I’m curious if JSC does still do this).
The problem with doing that is that it impacts developer tools significantly.
(To be continued after changing device)
(Edit: Continuing)
Essentially developers want to be able to debug their code, but you run into issues if they start debugging after page load because if you do optimize this, then anything they may want to inspect may have been optimized away, at that point you (the engine dev) can't do anything to bring it back, but you could say "from this point on I'll keep things around" (JSC at least in the past would essentially discard all generated code when the dev tools were opened). You might also say "if the dev tools are enabled (not open)" then I'll always do codegen assuming they'll be needed. Or you might say "if the dev tools are open on page load generate everything as [essentially] debug code" (which is still optimized codegen, but less aggressive about GC related things).
All of those options work, but the problem you then have is potentially significant changes in behavior if the dev tools are enabled and/or open - especially nowadays where JS exposes various weak reference types where this kind of change directly impacts the behaviour of said weak references. So now the question becomes how much of a behavioural difference am I willing to accept between execution with or without dev tools involved.
It's possible (I haven't worked directly on browser engines in a number of years now) that the consensus has become "no difference is ok" - this kind of space leak is not super common, etc and the confusion from different behavior with/with-out dev tools might be considered fairly obnoxious.
At Meta I worked on memlab [1]. This tool is very effective at finding memory leaks in our JavaScript. AFAIK we found one, but only one, such leak that happened in production code. Once discovered it was easily fixed. But understanding this class of issue was important to make sense of the leak report.
I spoke to V8 engineers about this many years ago. JS VMs -could- handle this without leaking, but they would have to do more analysis when compiling code. So are we willing to trade-off startup time to fix a somewhat rare memory usage? I don't think anyone has spent the time to study this in depth and collect numbers to weight the tradeoff.
If suddenly this problem became a big deal and caused a lot of leaks then I think we'd see JS VMs fix it. Perhaps one day if a framework makes this kind of error more frequent.
Just today (re)discovered FinalizationRegistry which is a big help if you are worried about what may be left behind. It's quite nice to be able to see a log of objects disappearing.
It's frustrating sometimes waiting for GC to kick in. The Chrome debugger seems to offer a GC now thing, but either it didn't work or it's just a kind of suggestion.
You are not absent from conservative approximation even with manual memory management. E.g. a map is a trivial example where you can’t always know which elements will be used during the lifetime of the program (and only those has to be kept alive).
Not really. It is trivially knowable that () => clearTimeout(id) does not reference bigArrayBuffer.
The design is not because it is hard to know what needs to be kept alive, but rather it is more efficient to use one closure record with id and bigArrayBuffer than two different ones.
And your solution is not adequate: the closure does reference the variable, but it’s possible to have a reference to the closure that exists for some reason other than calling the closure (the second closure does this - it references the closure that references the variable, but won’t call the closure that references the variable). It’s not straightforward to solve that problem, and the solution involves a harder constraint system than just object-to-object reachability.
No, JS engines choose it out of spec compliance. (Source: I worked on a JS engine for 10.5 years.)
And your solution doesn’t solve the problem. The problem is you’ve got once closure that does close over a reference, and another closure that closes over the first closure, just not to call it, but because the function pointer carries meaning for the purpose of clearing the timeout.
A JS engine could solve this by saying that the state closed over by a closure is only live if the closure itself is live for the purpose of a call, but defining that is next to impossible given JS’s dynamism (the runtime would have to prove that clearTimeout doesn’t call its argument, which is a crazy high proof burden in JS).
> another closure that closes over the first closure, just not to call it, but because the function pointer carries meaning for the purpose of clearing the timeout.
Isn't the second closure closing over the timeout identifier, which is a bare integer, rather than over the first closure?
I don't know how to square this with your credentials, but the spec does not specify the GC behavior of closures. (This only thing it's ever said about GC was WeakMap/WeakSet in ES2015.)
I'm sorry, I literally do not understand what you are saying.
Java does closures this way. Python does closures this way. Google has considered not doing closures that way in JS. [1]
This is an intentional but extremely optional implementation choice, made for performance reasons. (See Chromium thread.)
If you want to understand what I’m saying, then consider carefully if your solution even fixes the leak. It doesn’t. The original post isn’t clear on this and maybe that’s what you’re confused about.
You’re right that the spec doesn’t call for a particular closure representation, but it makes only one representation practical due to the shared mutable state of scopes, which is mandated by the spec. This makes JS’s closures unlike closures in other languages, which allow for greater flexibility. But thats off topic for the original post - the leak - since that leak would exist even with your proposed separate scopes.
Again: stated most precisely, and as I’ve said more than once in the thread, the leak is that closure #2 closes over closure #1, not that closure #2 closes over a separate variable in the same scope. Making the scopes separate won’t fix the leak.
I don't see this in the example. Closure #1 is never bound in the scope, as far as I can tell. Could you explain how closure #1 is closed over without being bound?
The language specification says to use a single scope (see olliej's detailed comment). You could change it to use separate scopes here but it won't solve the more generic issue of imprecise captures, just this specific case.
I believe the technical term for the property that existing JS engines lack here is "safe for space". The V8 bug (https://issues.chromium.org/issues/41070945) has already been linked elsewhere).
Both of those bugs (especially the JSC one) sketch out possible solutions and give some insight into why this is hard to implement efficiently. In general, it adds a lot of complexity to an already complicated (and performance-sensitive!) chunk of code.
I got confused here, because in the timers example, I assumed that without calling cancelDemo/clearTimeout, we cannot assume that the timer's function is no longer callable. Heck, even WITH clearTimeout... without understanding its implemetation, we cannot assume it's no longer callable by some internal system.
I think you should call cancelDemo() in the article to show that yes, even when we assume that the reference to the closure is cleaned up, the allocation persists.
(() => {
// Assumed implementation of setTimeout
// in which we cannot make assumptions that fn
// is inaccessible after the first invocation
const timers = [];
function mySetTimeout(fn) {
fn();
// retain the function for some reason
// maybe because of timers/intervals implemetation detail
timers.push(fn);
const id = timers.length - 1;
return id
}
function myClearTimeout(id) {
// this makes it to fn has no more references
// but it still gets retained
const fn = timers.splice(id, 1);
// For good measure
delete fn;
}
function demo() {
const bigArrayBuffer = new ArrayBuffer(100_000_000);
const id = mySetTimeout(() => {
console.log(bigArrayBuffer.byteLength);
}, 1000);
return () => myClearTimeout(id);
}
cancelDemo = demo();
// Even when calling clearTimeout,
// bigArrayBuffer is still allocated,
// Which is the crux of the article
cancelDemo();
// Now it's actually deallocated, as
// shows in the article
delete cancelDemo;
})();
It would be intuitive to me if function closures only retained things they reference
It would also be intuitive to me if closures naively retained everything in scope
It's bizarre to me that the behavior is "if one closure references something, then all of them retain it"
I guess maybe it's a stack vs heap thing? If nothing retains a variable then it can be kept on the stack, but once it has to outlive the function it has to be moved. Still odd the bookkeeping can't distinguish closures that references it from ones that don't, if it already has to check that for the entire set
But I suppose it's more about what you're tracking? For example, instead of tracking reference counts to variable, they're tracking reference counts to each scope. Then, if a scope has no more references, the variables it owns are cleaned up.
Ok, so there have been a lot of comments on what causes this, which are not correct, and a few that are pretty close. Here is the actual explanation of what is happening and what is causing the leak.
For context, I used to be on TC39 (the ecmascript standards committee) and spent many many years working on JSC, and specifically working on the GC and closure modeling.
First off: this is not due to eval. In ES3.1 or ES5 (alas I can't recall which) we (tc39) clarified the semantics of eval, to only evaluate in the containing scope if it is called directly - essentially turning it into a pseudo operator (implementations today generally implement a direct eval as `if (target function == real eval function) { do eval } else { call the function }`. Calling eval in any way other that `eval(<expression>)` will not invoke the scope capturing behavior of eval (this is a strict requirement to allow fast access to non-local variables).
The function being reported as exhibiting the bad/unexpected behavior in the post is:
function demo() {
const bigArrayBuffer = new ArrayBuffer(100_000_000);
const id = setTimeout(/* timeout closure */ () => {
console.log(bigArrayBuffer.byteLength);
}, 1000);
return /* cleanup closure */ () => clearTimeout(id);
}
If we were to follow the spec language fairly explicitly, the behavior of this function is (eliding exact semantics of everything other than creation of function objects and closures)
1. enter the function
2. env = create an empty lexical environment object
(I may use "activation" by accident because
that was the spec language when I was first
working on JS engines)
a) set the parent scope of env to the internal scope reference of the
callee (in this case because demo is a global function this will be
the global object)
b) add a property "bigArrayBuffer" to env, setting the value
to undefined
c) add a property "id" to env, setting the value to undefined
3. evaluate `new ArrayBuffer(100_000_000)` and assign the result
to the "bigArrayBuffer" property of env
4. Construct a function object for the timeout closure, and set its
internal scope reference to *env* (i.e. capture the containing
scope)
5. call setTimeout passing the function from (4) and 1000 as,
and assign the result to the "id" property on the env object
6. construct the cleanup closure, and set the internal scope
property to env
The result of this is that we end up with the following set of objects:
globalObject = {.....}
demo = Function { @scope: globalObject }
<demo_env> (not directly exposed anywhere) =
LexicalEnvironment { @scope: demo.@scope,
bigArrayBuffer: big array,
id: number
}
<timeout closure> = Function { @scope: demo_env }
<cleanup closure> = Function { @scope: demo_env }
At which point you can see as long as either closure is live, the reference to bigArrayBuffer is reachable and therefore kept alive.
Now, I was confused about this report originally as I know JSC at least does do free var anaylsis (and I can't imagine v8 doesn't, not sure about SM these days) to reduce false captures, because I had not properly read the example code, and was like "why is this being kept alive", but having actually read the code properly and written out the above it's hopefully very obvious to everyone now.
The language semantics of JS mean that all closures in a given scope chain share that scope chain, which means if one closure captures a variable, then all closures will end up keeping that capture alive, and there is not a lot the JS engine can do to limit that.
There are some steps that could be taken to mitigate or reduce this, but doing that kind of flow analysis can become expensive and a real issue JS engines have is that the overwhelming majority of JS runs a tiny number of times, and is extremely latency sensitive (this is why JSC has put so much effort into parsing + interpreter perf), and any real data flow analysis is too expensive for such code, and by the time code is hot enough to have warranted that kind of analysis the overall program state has got to a point where you cannot retroactively remove closure references, so they remain.
Now something that you _could_ try as a developer in this kind of scenario would be to use let, or a scoped let, to reduce the sharing of scopes, e.g.
function demo() {
let id;
{
let bigArrayBuffer = new ArrayBuffer(100_000_000);
id = setTimeout(/* timeout closure */ () => {
console.log(bigArrayBuffer.byteLength);
}, 1000);
}
return /* cleanup closure */ () => clearTimeout(id);
}
which might resolve this issue, in this particular kind of case.
In principle an engine could introduce logic to try to track exactly how many live closures reference a captured variable, but this is also tricky as you could easily end up with something like:
function f() {
let x = new GiantObject;
return (a) => {
if (a) return (g) => { g(x) }
return (g) => { g(null); }
}
}
y = f() // y needs to keep x alive
y = y(some value) // you get a new closure which
// may or may not be the one referencing
// x.
This is something you _could_ support, but there's a lot of complexity to ensuring correct behavior and maintaining performance in all the common cases, and it's possibly just not worth it given the JS capturing model.
There are also a few things you could do that would likely be relatively easy/low cost from a JS engine that would remove some cases of excessive capture, but they'd still just be helping super trivial cases like this reduced example code, not necessarily any actual real world examples.
Thanks for the explanation. I think I understand why the leak happens with the problematic case. But why doesn't the last example before the problematic example leak? Doesn't the cleanup closure still retain the demo_env.
This is the example I mean:
function demo() {
const bigArrayBuffer = new ArrayBuffer(100_000_000);
const id = setTimeout(() => {
console.log('hello');
}, 1000);
return () => clearTimeout(id);
}
globalThis.cancelDemo = demo();
[Edit]
The blog post got updated with some links to other articles, which made it clear to me now. It looks like variables are only saved in the context/env if they are used in an inner function. Although one of the blog posts says that this is an implementation detail but at least it seems to work like this in FF/SM and V8.
I didn't want to go into implementation details as they'd complicate matters :D
What you're seeing here is the big difference between specification language and implementation - the major JS engines perform free variable analysis and only put variables that are actually captured into the activation object. This means in that in the example `bigArrayBuffer` is simply never stored on or referenced by any object and becomes unreachable the moment demo() returns. Engines can do this because it's essentially unobservable (though the addition of weak references technically undermines this claim, mercifully no one has yet decided that the existence of a closure that _could_ hold an object should be required to).
Anyway, the big reason for performing free variable analysis is not to save memory/prevent space leaks as discussed in this article, but to make the function run faster. The first (and biggest) thing you win is that if nothing is captured in a function with closures you don't have to create the lexical environment object:
function my_sort(arr) {
... <imagine there are some variables or something> ...
return array.sort((a,b)=>{ if (a<b) return -1; return 1})
}
There is a lot of code that uses non capturing closures like this, but without doing any free variable analysis the my_sort function would have to allocate a new lexical environment object on every call, even though nothing is actually captured.
That environment object allocation was the bit that was expensive enough to warrant doing the free variable analysis when I implemented it in JSC - the time required to perform the free variable analysis is less than the time spent allocating and collecting unnecessary lexical environment objects. As a bonus once you have performed the capture analysis you can also use that information for an bunch of other low cost/complexity perf improvements as well.
Could it be that the vector is hoisted outside of the function's inner scope, promoted to a literal-like datum attached to the outer function?
It's obvious from the function that the ArrayBuffer object never escapes from the scope, and is never modified.
If the object is never modified, there is no need to keep newly instantiating it; it can be hoisted to the function and attached to it somehow, so then to get rid of it, we have to lose the function itself.
Can reproduce in latest Firefox and Chromium. I wonder whether this is an actual leak or there is a good reason for JS engines to retain the array buffer.
How can I try this myself? i.e. see bigArrayBuffer in memory and see if it is/isn't garbage collated. I am guessing I can use the Chrome debugger but I would love to know to do it how step by step if anyone has a link
As an old OO guy, I like to think of a closure as syntax sugar for a class that’s being generated which has fields for all the variables in a scope that are used in callbacks. In those terms, this is quite a bit less surprising. (That said, you could also imagine generating a separate class for each callback - I wonder why JS engines don’t do that)
The venerable master Qc Na was walking with his student, Anton. Hoping to prompt the master into a discussion, Anton said "Master, I have heard that objects are a very good thing - is this true?" Qc Na looked pityingly at his student and replied, "Foolish pupil - objects are merely a poor man's
closures."
Chastised, Anton took his leave from his master and returned to his cell, intent on studying closures. He carefully read the entire "Lambda: The Ultimate..." series of papers and its cousins, and implemented a small Scheme interpreter with a closure-based object system. He learned much, and looked forward to informing his master of his progress.
On his next walk with Qc Na, Anton attempted to impress his master by saying "Master, I have diligently studied the matter, and now understand
that objects are truly a poor man's closures." Qc Na responded by hitting Anton with his stick, saying "When will you learn? Closures are a poor man's object." At that moment, Anton became enlightened.
> I'll take some koanic license
and combine Norman Adams (alleged source of "objects are a poor man's
closures") and Christian Queinnec ("closures are a poor man's objects") into
a single great Zen language master named Qc Na.
These days Chrome has a "memory saver" feature that deactivates inactive tabs and free their memory. It's something Apple pioneered in mobile Safari from the very beginning. For people like me who keep 500+ tabs open always, it's a great feature.
TLDR: the entire scope for a closure is retained as long as that environment might still be referenced. There's no such thing as a partial scope in JavaScript (to my knowledge, please correct if wrong).
In the example, if you don't capture `id` in the returned closure, the problem goes away.
> There's no such thing as a partial scope in JavaScript (to my knowledge, please correct if wrong).
There's also no such thing as garbage collection in JavaScript. If you look purely at the language spec, then the language behaves as if there is infinite memory.
Garbage collection and how that interacts with scopes is purely an implementation detail. A conforming JavaScript implementation definitely could determine that a given local variable captured by a closure will no longer be accessed and free the associated memory. It's just that the implemented tested by the author (I'm assuming Chrome) doesn't do that.
> There's also no such thing as garbage collection in JavaScript.
Incorrect. The details of garbage collection are left to the implementation, but the spec itself at least implies that GC is expected of implementations [0]. The current spec has an entire section on memory management [1] with several constructs for garbage collection and many explicit references to GC.
That being said, it would be more correct to say this implementation of JavaScript does not perform capture analysis, effectively creating partial scopes.
Even so, you can have an entirely conforming JavaScript implementation with no GC. It just means that WeakRefs and WeakMaps will always continue to hold their references.
I know the debugger cannot always show you variables from the originating scope or even completed blocks above the breakpoint so it’s clear that some data not used in the closure becomes unreachable while the closure is still live.
But I’ve also spent time hunting down giant memory leaks where some things did get preserved that should not have. I’m so glad domains have finally gone from “deprecated” to deprecated (telling people not to use a feature when you haven’t provided the replacement is fucking stupid) as most of that kind of problem that I can recall involved domains in some manner.
This happens because the closure's context object is shared between all closures in a given scope. So as soon as one variable from a give scope is accessed through a closure then all variables will be retained by all inner functions.
Technically the engines could be optimizing it when no eval used is detected or when in strict mode (which blocks eval), but I guess that dynamically dropping values from a closure context based on inner lexical scopes can be really tricky thing to do and probably not worth the overhead.