Mostly sounds like a case of fixing up some non-optimal defaults - the real hard part is gathering data from real world use cases which is required to make your defaults better.
1) Reducing the V8 heap page size from 1M to 512KB ... results in 2x memory reduction
2) The memory visualization tool helped us discover that the background parser would keep an entire zone alive long after the code was already compiled. ... resulted in reduced average and peak memory usage.
3) C++ compiler not packing structs optimally for V8s case - manual packing saves some peak memory.
Compilers typically align struct fields to a boundary to favor execution speed. This may be good for most use cases where there aren't thousands of instances of that struct wasting a lot of memory making it a bad tradeoff.
So C/C++ compilers allow you turn this off either by specifying the data length yourself (bit fields) and/or specifying attribute(__packed__) directive in the case of GCC for example. In this case the packed directive resulted in suboptimal packing so they must have gone with bit fields I suppose to do manual packing.
Piggybacking on this, can anyone recommend an approach for finding the cause of memory issues in v8?
In my case I'm tuning a 3D game, and the heap usage is a sawtooth pattern with GCs about once per second. Not awful, but it would be nice to smooth it out. But playing with Chrome's various memory profiling tools I've never been able to discover where the allocations are coming from - the results always seem to be dominated by apparently internal stuff (like entries called "(code deopt data") or such). Does anyone know any good techniques for such things?
It flips some handy flags providing useful output, this output can quickly illuminate issues the regular tools do not (yet).
Running this on the example you linked to bellow, shows that a series of functions are deopting and optimizing repeatedly. most likely causing at-least some of the sawtooth pattern you see.
Hrm. I've looked at deopts before, and I may be wrong but I think they aren't the issue. As I understand v8, it's normal at startup for hot functions to go through the opt/deopt cycle several times as the engine learns about them - and once a function does it too many times, it gets deopted permanently.
For this reason I always let the game run for 10-15 seconds before I profile, figuring that by that time most of the opt/deopt churn will be finished. And this (very useful) script seems to back this up - I get output like yours at startup, but if I wait a while and then rm the output files, further output is relatively minimal.
So I'm inclined to think that opt/deopt stuff isn't the issue, and there really are lots of JS heap objects getting allocated somewhere. At the same time though, when I used Chrome's built-in memory profiling I see a bunch of deopt-related strings, so maybe I'm way off base. If anyone sees what I'm missing please do clue me in.
> At the same time though, when I used Chrome's built-in memory profiling I see a bunch of deopt-related strings, so maybe I'm way off base
In my experience, these add up real quick and are often indicators of a larger "instability" issue that remains well after the "deopt churn" appears to settle, but continues manifests in the form of some heavy GC.
Note: many internal structures related to the JIT (IC/hidden classes/code gen etc) can themselves cause sufficient GC pressure, as can the code you described as "de-opted permanently".
Usually that's because you're creating and disposing of a lot of objects in your game loop. Look into initializing a lot of objects at the start and reusing 'em (usually called object pooling).
I'd love to take a look deeper - contact info is in my profile if you're interested.
Sorry if I wasn't clear, but the problem is finding the allocations, not fixing them. (They are probably happening in the 3D engine, i.e. code I didn't write.)
One suspects that DevTools' memory profiling should be the place to start, but I haven't found any way to get it to shed any light on where allocations are occurring. So that's what I'm asking about here.
Recent versions of Chrome DevTools have a new profiling feature called 'Record allocation profile' that may help. Enable this around a few of the sawtooth and it will give you a profile based on a sampling of allocations that happen during that period. The profile will include the stack-trace at the time of the allocation which should help you figure out where the allocations are coming from.
Thanks, this view looks really interesting and I hadn't looked at it deeply.
With that said, do you have any advice on how to use it in practice? Timeline profiles tell me my app goes through maybe 5MB of heap per second, but when I use this feature for say, 5 seconds, it tends to report 2-3 functions as having allocated 16kb each. (And if I run it again, I get similar results but with a different 2-3 functions.) Is it just reporting a very small subset of allocations?
This profiler is sampling based. It takes a sample once every 512KiB allocated (on average, randomized) and reports all the allocations still alive at the end of the interval. So, yes it reports the subset of allocations that are sampled and are 'leaking' from the interval. In that sense it is better at finding memory leaks.
If you want to look at all the allocations during the interval, then you can use the 'Allocation timeline' profile – this will give you all the allocations but note that this might have significant overhead.
Thanks for the info. Is there a way to get the Allocation timeline to report about all allocations though? It seems to only report objects that are uncollected (that show up as blue in the timeline). That's useful for finding true leaks, but in my case (trying to fix a sawtooth pattern of heap usage), stuff that was allocated and then quickly GC'ed is exactly what I want to know about. Or am I looking in the wrong place?
You are going to hate this one but its effective - divide and conquer. Just strip out features in a binary-chop fashion and rerun the memory profile. Depending on the code, it doesn't take as long as you think it will to get to small units of functionality e.g. sounds like just commenting out the 3d engine render step would be informative to you.
As I commented to a different reply, skipping the render step does indeed remove the sawtooth, which is why I think the problem is in the 3D engine. Past that, it's not so easy to jump into a three.js-sized chunk of 3rd party code and find atomic things that can be turned off in such a way that everything else still functions.
(I mean, not to suggest that what you describe isn't useful - I'm just hoping to use profiling tools to attack from a different direction.)
>> not so easy to jump into a three.js-sized chunk of 3rd party code and find atomic things that can be turned off
Incidentally, doing a dissection like this is actually an engaging way to learn a big chunk of 3rd party code compared to just staring at it. By stubbing out various pieces you learn where the joints are where things are entangled nests of sorrow.
It's not yet, but I'm pretty sure the allocations are happening in the 3D engine I'm using. (I mean the chrome tools aren't telling me that, but when I let the game run without rendering the sawtooth goes away.)
Actually now that I take a second look, some relatively simple demos of the same engine (Babylon.js) show the same sort of behavior. Some rather trivial three.js demos do as well. I might be dealing with something that's just a fact of life for webGL rendering.
Random example (not mine) showing a vaguely similar sawtooth of heap memory usage:
hey tehlike - I work on streak.com and we have some pretty interesting memory challenges being a full blown web app running inside of another very complex web app (gmail). Any interest in poking around and helping us with memory issues?
"Reducing the V8 heap page size from 1M to 512KB results
in a smaller memory footprint when not many live objects
are present and lower overall memory fragmentation
up to 2x."
Is it common to say something's shrunk by 2x? Why not say 0.5x (or 50%, half, etc.) I understand growth of 2x and assume this is a mistake, though I'm open to convention.
The x means "times" as in multiply and the "lower" inverts it. You could also say it's halved, been reduced by a half, been reduced by 50% or by 0.5. I think if something has shrunk by 0.5x then this could even be interpreted as it having grown 2x but this would be a pretty odd way to say something.
Mozilla's memory project a while ago showed that there's often a lot of low hanging fruit if it hasn't been a big focus. V8 uses a lot of memory in comparison in my experience. I think this should be generally without a downside.
According to tests it doesn't. The heap memory optimization will only trigger on mobile devices <512MB ram. The zone memory improvements shouldn't have a negative effect.
There is no downside to keeping the overall memory arena size down: better cache locality, fewer stop the world collections. Node programmers have been asking for this behavior for years.
As mentioned in the article there is no magic latency, throughput and memory consumption are connected optimizing for one you will sacrifice a bit in the other areas.
It depends on your definition of "low-memory". The Kinoma XS JavaScript ES6 runtime[1] is designed specifically for low memory/CPU limited embedded devices, and runs very comfortably on 200 MHz ARM devices with 512 KB RAM.
I've run node tasks that run in pretty low memory... when processing streams I'll often use the command-line option to expose gc, and force it after each item... runs very light that way.
1) Reducing the V8 heap page size from 1M to 512KB ... results in 2x memory reduction
2) The memory visualization tool helped us discover that the background parser would keep an entire zone alive long after the code was already compiled. ... resulted in reduced average and peak memory usage.
3) C++ compiler not packing structs optimally for V8s case - manual packing saves some peak memory.