Fall cleaning: Optimizing V8 memory consumption

blinkingled · on Oct 7, 2016

Mostly sounds like a case of fixing up some non-optimal defaults - the real hard part is gathering data from real world use cases which is required to make your defaults better.

1) Reducing the V8 heap page size from 1M to 512KB ... results in 2x memory reduction

2) The memory visualization tool helped us discover that the background parser would keep an entire zone alive long after the code was already compiled. ... resulted in reduced average and peak memory usage.

3) C++ compiler not packing structs optimally for V8s case - manual packing saves some peak memory.

williadc · on Oct 8, 2016

I'm not familiar with the "packing" concept. How does one go about manual packing? How does it affect maintenance?

blinkingled · on Oct 8, 2016

Compilers typically align struct fields to a boundary to favor execution speed. This may be good for most use cases where there aren't thousands of instances of that struct wasting a lot of memory making it a bad tradeoff.

So C/C++ compilers allow you turn this off either by specifying the data length yourself (bit fields) and/or specifying attribute(__packed__) directive in the case of GCC for example. In this case the packed directive resulted in suboptimal packing so they must have gone with bit fields I suppose to do manual packing.

puzzle · on Oct 8, 2016

Or you might also rearrange fields of the same type to be together.

Lozzer · on Oct 8, 2016

http://www.catb.org/esr/structure-packing/

fenomas · on Oct 8, 2016

Piggybacking on this, can anyone recommend an approach for finding the cause of memory issues in v8?

In my case I'm tuning a 3D game, and the heap usage is a sawtooth pattern with GCs about once per second. Not awful, but it would be nice to smooth it out. But playing with Chrome's various memory profiling tools I've never been able to discover where the allocations are coming from - the results always seem to be dominated by apparently internal stuff (like entries called "(code deopt data") or such). Does anyone know any good techniques for such things?

iamstef · on Oct 8, 2016

You may want to give the follow script a try:

* https://gist.github.com/krisselden/d3ce3cbb37cc6035b0927fdbf...

It flips some handy flags providing useful output, this output can quickly illuminate issues the regular tools do not (yet).

Running this on the example you linked to bellow, shows that a series of functions are deopting and optimizing repeatedly. most likely causing at-least some of the sawtooth pattern you see.

example output (likely related to the problem):

  ```
  removing optimized code for: r.getViewMatrix]
  [removing optimized code for: r._isSynchronizedViewMatrix]
  [removing optimized code for: r._computeViewMatrix]
  [removing optimized code for: r._isSynchronized]
  [removing optimized code for: r._computeViewMatrix]
  [removing optimized code for: r._isSynchronizedViewMatrix]
  [removing optimized code for: r._isSynchronized]
  [removing optimized code for: r._computeViewMatrix]
  [removing optimized code for: r._isSynchronizedViewMatrix]
  [removing optimized code for: r._isSynchronized]
  [removing optimized code for: r.getViewMatrix]
  [removing optimized code for: r._isSynchronizedViewMatrix]
  [removing optimized code for: r._computeViewMatrix]
  [removing optimized code for: r._isSynchronized]
  [removing optimized code for: r._computeViewMatrix]
  [removing optimized code for: r._isSynchronizedViewMatrix]
  [removing optimized code for: r._isSynchronized]
  [removing optimized code for: r._computeViewMatrix]
  [removing optimized code for: r._isSynchronizedViewMatrix]
  [removing optimized code for: r._isSynchronized]
  [removing optimized code for: r.getViewMatrix]
  ```

---

note: Credit for this should go to @krisselden the author of the above gist not me.

fenomas · on Oct 8, 2016

Hrm. I've looked at deopts before, and I may be wrong but I think they aren't the issue. As I understand v8, it's normal at startup for hot functions to go through the opt/deopt cycle several times as the engine learns about them - and once a function does it too many times, it gets deopted permanently.

For this reason I always let the game run for 10-15 seconds before I profile, figuring that by that time most of the opt/deopt churn will be finished. And this (very useful) script seems to back this up - I get output like yours at startup, but if I wait a while and then rm the output files, further output is relatively minimal.

So I'm inclined to think that opt/deopt stuff isn't the issue, and there really are lots of JS heap objects getting allocated somewhere. At the same time though, when I used Chrome's built-in memory profiling I see a bunch of deopt-related strings, so maybe I'm way off base. If anyone sees what I'm missing please do clue me in.

(Also: great tip on the script!)

iamstef · on Oct 9, 2016

> At the same time though, when I used Chrome's built-in memory profiling I see a bunch of deopt-related strings, so maybe I'm way off base

In my experience, these add up real quick and are often indicators of a larger "instability" issue that remains well after the "deopt churn" appears to settle, but continues manifests in the form of some heavy GC.

Note: many internal structures related to the JIT (IC/hidden classes/code gen etc) can themselves cause sufficient GC pressure, as can the code you described as "de-opted permanently".

Interestingly it is also possible for the above mentioned GC pressure to itself cause some fun (even more GC pressure): https://bugs.chromium.org/p/v8/issues/detail?id=5456

This may not be the root of your issue, but I would be careful to rule it out entirely to quickly.

Anyways, best of luck!

RKoutnik · on Oct 8, 2016

Usually that's because you're creating and disposing of a lot of objects in your game loop. Look into initializing a lot of objects at the start and reusing 'em (usually called object pooling).

I'd love to take a look deeper - contact info is in my profile if you're interested.

fenomas · on Oct 8, 2016

Sorry if I wasn't clear, but the problem is finding the allocations, not fixing them. (They are probably happening in the 3D engine, i.e. code I didn't write.)

One suspects that DevTools' memory profiling should be the place to start, but I haven't found any way to get it to shed any light on where allocations are occurring. So that's what I'm asking about here.

ofrobots · on Oct 8, 2016

Recent versions of Chrome DevTools have a new profiling feature called 'Record allocation profile' that may help. Enable this around a few of the sawtooth and it will give you a profile based on a sampling of allocations that happen during that period. The profile will include the stack-trace at the time of the allocation which should help you figure out where the allocations are coming from.

fenomas · on Oct 8, 2016

Thanks, this view looks really interesting and I hadn't looked at it deeply.

With that said, do you have any advice on how to use it in practice? Timeline profiles tell me my app goes through maybe 5MB of heap per second, but when I use this feature for say, 5 seconds, it tends to report 2-3 functions as having allocated 16kb each. (And if I run it again, I get similar results but with a different 2-3 functions.) Is it just reporting a very small subset of allocations?

ofrobots · on Oct 8, 2016

This profiler is sampling based. It takes a sample once every 512KiB allocated (on average, randomized) and reports all the allocations still alive at the end of the interval. So, yes it reports the subset of allocations that are sampled and are 'leaking' from the interval. In that sense it is better at finding memory leaks.

If you want to look at all the allocations during the interval, then you can use the 'Allocation timeline' profile – this will give you all the allocations but note that this might have significant overhead.

fenomas · on Oct 8, 2016

Thanks for the info. Is there a way to get the Allocation timeline to report about all allocations though? It seems to only report objects that are uncollected (that show up as blue in the timeline). That's useful for finding true leaks, but in my case (trying to fix a sawtooth pattern of heap usage), stuff that was allocated and then quickly GC'ed is exactly what I want to know about. Or am I looking in the wrong place?

duncanawoods · on Oct 8, 2016

You are going to hate this one but its effective - divide and conquer. Just strip out features in a binary-chop fashion and rerun the memory profile. Depending on the code, it doesn't take as long as you think it will to get to small units of functionality e.g. sounds like just commenting out the 3d engine render step would be informative to you.

fenomas · on Oct 8, 2016

As I commented to a different reply, skipping the render step does indeed remove the sawtooth, which is why I think the problem is in the 3D engine. Past that, it's not so easy to jump into a three.js-sized chunk of 3rd party code and find atomic things that can be turned off in such a way that everything else still functions.

(I mean, not to suggest that what you describe isn't useful - I'm just hoping to use profiling tools to attack from a different direction.)

duncanawoods · on Oct 8, 2016

Understood.

>> not so easy to jump into a three.js-sized chunk of 3rd party code and find atomic things that can be turned off

Incidentally, doing a dissection like this is actually an engaging way to learn a big chunk of 3rd party code compared to just staring at it. By stubbing out various pieces you learn where the joints are where things are entangled nests of sorrow.

tehlike · on Oct 8, 2016

Is the game open to public? I would like to give it a look, i have done some amount of optimizations as a hobby.

fenomas · on Oct 8, 2016

It's not yet, but I'm pretty sure the allocations are happening in the 3D engine I'm using. (I mean the chrome tools aren't telling me that, but when I let the game run without rendering the sawtooth goes away.)

Actually now that I take a second look, some relatively simple demos of the same engine (Babylon.js) show the same sort of behavior. Some rather trivial three.js demos do as well. I might be dealing with something that's just a fact of life for webGL rendering.

Random example (not mine) showing a vaguely similar sawtooth of heap memory usage:

http://gleborgne.github.io/molvwr/#1GCN

alooPotato · on Oct 8, 2016

hey tehlike - I work on streak.com and we have some pretty interesting memory challenges being a full blown web app running inside of another very complex web app (gmail). Any interest in poking around and helping us with memory issues?

natorion · on Oct 8, 2016

There are tools with more depths: You can try about:tracing or the heap visualizer mentioned in the article.

caminante · on Oct 7, 2016

I was confused by the article.

  "Reducing the V8 heap page size from 1M to 512KB results 
  in a smaller memory footprint when not many live objects 
  are present and lower overall memory fragmentation 
  up to 2x."

Is it common to say something's shrunk by 2x? Why not say 0.5x (or 50%, half, etc.) I understand growth of 2x and assume this is a mistake, though I'm open to convention.

anewhnaccount · on Oct 8, 2016

The x means "times" as in multiply and the "lower" inverts it. You could also say it's halved, been reduced by a half, been reduced by 50% or by 0.5. I think if something has shrunk by 0.5x then this could even be interpreted as it having grown 2x but this would be a pretty odd way to say something.

paulddraper · on Oct 8, 2016

Shrinkage of 2x is the inverse of growth by 2x.

I.e. shrinking 2x and then growing 2x would bring you to the original.

Though in this specific case, I would have simply said "halved".

lawnchair_larry · on Oct 8, 2016

Because it grew by 0.5x ;)

ASalazarMX · on Oct 7, 2016

It's confusing. I'd think it means it shrunk 0-2 times by an indeterminate amount.

qaq · on Oct 7, 2016

Hope they don't adversely affect Node's performance with this tuning for low memory devices

hajile · on Oct 8, 2016

Mozilla's memory project a while ago showed that there's often a lot of low hanging fruit if it hasn't been a big focus. V8 uses a lot of memory in comparison in my experience. I think this should be generally without a downside.

lytedev · on Oct 8, 2016

Couldn't node simply tell V8 to use the earlier defaults if it doesn't have its own tunings already?

natorion · on Oct 8, 2016

According to tests it doesn't. The heap memory optimization will only trigger on mobile devices <512MB ram. The zone memory improvements shouldn't have a negative effect.

bmodeldotcom · on Oct 8, 2016

There is no downside to keeping the overall memory arena size down: better cache locality, fewer stop the world collections. Node programmers have been asking for this behavior for years.

qaq · on Oct 8, 2016

As mentioned in the article there is no magic latency, throughput and memory consumption are connected optimizing for one you will sacrifice a bit in the other areas.

sdesimonebcn · on Oct 8, 2016

What is the relation between the OP and previous work concerning the Ignition interpreter? http://v8project.blogspot.com.es/2016/08/firing-up-ignition-...

d08ble · on Oct 8, 2016

JS is wanted unlimited memory consumption in the language specs. JS is good, but impossible for low-memory tasks.

CharlesW · on Oct 8, 2016

It depends on your definition of "low-memory". The Kinoma XS JavaScript ES6 runtime[1] is designed specifically for low memory/CPU limited embedded devices, and runs very comfortably on 200 MHz ARM devices with 512 KB RAM.

[1] http://kinoma.com/develop/documentation/technotes/introducin...

titzer · on Oct 8, 2016

Looks like that is ES5 only.

CharlesW · on Oct 8, 2016

It currently scores 96.8% on the ES6 Compatibility Table hosted by kangax[1], which is pretty good. That's native, no polyfills or transpiler.

EDIT: Kinoma claims ES6 compatibility is at 98% as of 2016-01-02.[2]

[1] https://kangax.github.io/compat-table/es6/ [2] http://www.kinoma.com/develop/documentation/js6/

tracker1 · on Oct 8, 2016

I've run node tasks that run in pretty low memory... when processing streams I'll often use the command-line option to expose gc, and force it after each item... runs very light that way.