JRE 8 needs more codecache than before

nn3 · on Sept 15, 2016

Typical problem of over tuning. Not every tunable needs to be tweaked (or worse cut'n'pasted somewhere from the internet). Your street cred as a admin does not depend on how many settings you can change. If they had just kept using the defaults everything would have been fine.

That said 250MB is really a lot of code. Their binaries must be gigantic. This by itself likely already causes performance problems because all caches will be thrashing.

zigzigzag · on Sept 15, 2016

Yes but he said it was previously only 50mb. That 250mb contains a lot of duplicated compiles: tiered compilation can generate not just two but up to I think 5 different compiles of the same methods! I think code cache management in Hotspot is not that great: eventually the server would probably converge on a bit higher than 50mb due to the more aggressive inlining, but 5x suggests the low value compiles aren't being properly discarded.

xxs · on Sept 15, 2016

It's all about the tiered compilation. Tons of unused but eagerly compiled code w/ C1 (the dumb and fast compiler) that never reached the proper C2.

The issue has been three since 6.0.23 when the -XX:+TieredCompilation defaulted to true. (hmm, or was it 6.0.25?)

I'd say nothing really new. If you run a server, just disable tired compilation altogether. It might cost few seconds slower startup but that's it.

billsmithaustin · on Sept 15, 2016

I believe TieredCompilation was disabled by default on Java 7u55:

    $ java -server -XX:+UnlockDiagnosticVMOptions -XX:+PrintFlagsFinal -version|grep TieredCompilation
         bool TieredCompilation                         = false           {pd product}        
    java version "1.7.0_55"
    Java(TM) SE Runtime Environment (build 1.7.0_55-b13)
    Java HotSpot(TM) 64-Bit Server VM (build 24.55-b03, mixed mode)

I would be interested in hearing from people with first-hand experience disabling TieredCompilation on Java 8 on production servers.

otterley · on Sept 15, 2016

To the author: It would be useful to put a little blurb paragraph at the top to say that if you haven't overridden the default codecache size via JVM options, no action is necessary. This information is buried way down at the bottom of the page, which may lead hurried readers to assume that something may already be wrong with their Java 8 JVMs.

billsmithaustin · on Sept 15, 2016

Thanks, I'll see what I can do to clarify that point.

EdSharkey · on Sept 15, 2016

Would a good approach to major JVM upgrades in production be to remove all but critical JVM flags in the new version, and run it on a subset of nodes, then?

After getting a baseline measurement of performance with JVM defaults, experiment with tuning a couple of settings, measure some more, make sure nothing breaks under load. Repeat until new JVM version seems stable/performs better/etc even under worst load and then upgrade all nodes to the latest and greatest?

user5994461 · on Sept 15, 2016

Yes. The good approach when you're given a legacy java application with tons of flags is to removed ALL the optimizations flags for the heap and the GC.

Most of the settings found in legacy projects or on the internet are either obsolete or defaults in the latest version of the JVM.

The only mandatory heap flag is: "-Xms???M -Xmx???M" which set the heap size of the application to ??? megabytes.

benzewdu · on Sept 16, 2016

Not even -Xms -Xmx, there's heuristics built in to appropriately size the gc spaces... http://hg.openjdk.java.net/jdk8/jdk8/hotspot/file/tip/src/sh...

user5994461 · on Sept 17, 2016

Right. Most applications don't need that.

It's only the server apps running 24/7 on fixed hardware that should have a defined memory usage.

billsmithaustin · on Sept 15, 2016

Yes, I think that approach would avoid propagating inappropriate or unnecessary settings.

protomyth · on Sept 15, 2016

I wonder how many projects have production issues because they are not able to hit expected production volume in a QA environment? This is certainly the cause of a lot of grief I've experienced.

lucozade · on Sept 15, 2016

For us, it's not production volumes that are the main issue but production dynamics i.e. how the volumes change over time. It's such a big issue that standard practice is to run QA against live production data by default and manage the consequences.

The downside is, when we do need to use test data e.g. when there's a large change to the structure of the upstream or we want to do stress scenarios, it's a pain in the backside.

cle · on Sept 15, 2016

We struggle with this too. There are so many ways the dynamics can change--daily/weekly/seasonal cycles, robots, client caching, etc. One best practice we use for high-volume services is to minimize the variance in call mixtures--if there are two calls with vastly different call patterns, it's probably worth it to split them into separate services, so you can tune throttling, GC, load balancing, etc. specifically for those calls, instead of having to tune them to support both calls (which is often difficult or impossible to do). Of course, it's hard to predict how your service will evolve over time, so making the split is often painful for you and your clients. Some of our services can't be handled by a single load balancer, so we use DNS round robins, which have a whole other class of problems when you have mixed call patterns. Gotta earn your pay...

Some other techniques we use are one-box deployments that receive a proportion of production traffic and "bake" new changes before deploying to the whole fleet, and shadow fleets which let you tune and test against live traffic. We've found that simply replaying production traffic at higher volumes sometimes isn't sufficient, because our calls don't necessarily scale that way (some of them scale with upstream traffic, some of them scale by downstream fleet sizes due to client caching).

Animats · on Sept 15, 2016

How much Java source code does it take to need 256MB of codecache? The author says they're using a service architecture where each transaction uses about 20 services. There's no indication of why their program is so huge.

billsmithaustin · on Sept 15, 2016

Author here. We know 128MB codecache was not enough and 256MB was sufficient. I think we could have gotten by with less, e.g. 200MB, but we stopped experimenting once we found a value that worked.

We don't have a complete understanding of why this app uses so much more codecache than other apps we've switched to Java 8. Now that we expose the codecache size in Datadog, I may try plotting code size vs codecache size for a variety of our apps.

remh · on Sept 15, 2016

Datadog employee here, you don't need to run your own collector to collect codecache usage. As it's exposed through JMX, the JMX collector can collect it http://docs.datadoghq.com/integrations/java/ . It doesn't by default though, so you'll need to configure it to collect the metrics from the "java.lang:name=Code Cache,type=MemoryPool" bean. We should add that in our default configuration.

uxcn · on Sept 15, 2016

If it's exposed through JMX, would it also be visible via something like jvisualvm?

remh · on Sept 15, 2016

Not sure about jvisualvm, but it will definitely be visible in jconsole

uxcn · on Sept 15, 2016

jvisualvm is largely a superset of jconsole, with a plugin interface, so it probably is.

I have no idea how many other people actually use it though.

monocasa · on Sept 15, 2016

I for one would love to see the relationship between program size and code cache size on real applications.

electrum · on Sept 15, 2016

Codecache is for compiled code, so it doesn't necessarily correlate to the original program's source code size. You can have the same methods inlined in many places, load the same classes in multiple class loaders (which means they are separate classes to the JVM), generate code at runtime, etc.

For example, Presto is a SQL query engine that generates code for each query (a SQL query is effectively a program), so it can need a lot of codecache depending on the query rate and concurrency.

Animats · on Sept 15, 2016

Now that's something to look for. If something is generating code for each transaction, and that code shares a cache with other, more permanent code, cache thrashing is possible. If the cache management favors new code over old code, old code is likely to be pushed out as the cache fills up with query code. Then the old code gets recompiled the next time it's needed. Could that be why so much time is being spent in the JIT compiler?

hyperpape · on Sept 15, 2016

> You can have the same methods inlined in many places

Isn't the JVM more conservative about inlining already inlined methods? I can't find a cite right now, but I swear I've seen this when inspecting the compiler output.

__-X-__ · on Sept 15, 2016

yes, if the method was already compiled into a big or medium size method (in term of assembly code). JITs will not try to inline it again otherwise you will have too much code duplication and nobody want too many cache misses from the instruction cache.

osi · on Sept 15, 2016

so the unspoken thing seems to be that they kept the same JVM arguments as java 7 and that caused problems in java 8? and they had a JVM argument that was setting a value to the same as the default?

billsmithaustin · on Sept 15, 2016

We knew to adjust other JVM settings (e.g. PermGen replaced by Metaspace) but we overlooked that (1) we used a non-default max codecache size and (2) the default for that setting was 3x higher for Java 8.

An unfortunate thing about using non-default JVM settings is that you need to scrutinize them every time you switch Java versions. If I ever go through this again, I will know to pay more attention to every JVM setting we override.

osi · on Sept 15, 2016

ah, so you had tuned JRE7 to use 64mb code cache (more than default), when then became far less than the new default.

__-X-__ · on Sept 15, 2016

-XX:-TieredCompilation is the magic option to disable tiered compilation.

Beginning with Java 8, instead of having the VM to magically choose between using the client JIT (c1: think V8) or the server JIT (c2: think gcc -O2), the default configuration is to run in so called "tiered mode", first starts with the interpreter (as usual) then c1 (here keep the profile info) then c2. Because the code is compiled twice, you need a twice bigger code cache.

From my own experience, tiered compilation is nice when you run something interactive like an IDE (IntelliJ IDEA) and useless when you run a server app. That's said i've never had to have a 250MB code cache.

bobbyi_settv · on Sept 15, 2016

If the JVM has one bytecode compiler that it uses only during startup and another that it used the rest of the time, shouldn't it dump the bytecode generated by the first compiler from the cache at the point when it switches to the second compiler?

jdmichal · on Sept 15, 2016

The article's wording is a bit misleading here. The first-tier and second-tier compilers are always available throughout the entire lifetime of the program. The first-tier compiler performs very few optimizations, but compiles very quickly. It is a simple and literal compiler -- most direct translation of byte code to machine code. It's meant to be a quick win to get out of interpreted mode, which is vastly slower. The second-tier compiler is slower, but also takes advantage of the data collected during runtime profiling to make optimizations. Basically, there's a trade-off between the time it takes to run the second-tier compiler and the time it will save. By comparison, the first-tier compiler is almost always a win over interpreted code, even if the code is only run a small number of times.

Now, sometimes optimizations can be "wrong", in the sense that something new has happened that invalidates an assumption used during second-tier compilation. Here's a real-world example: I have an interface and only one loaded class that implements that interface. The second-tier compiler will use this information to basically do this:

    function doStuff(MyInterface a) {
        if (a.getClass() != MyClass.class)
            deoptimize();

        // Do everything from here on assuming
        // that a is an instance of MyClass.
        // This includes inlining simple getter
        // methods to pure field accesses, etc.
    }

Now, when a second class that implements `MyInterface` is loaded and makes it to that function, it will hit the `deoptimize` branch and go back to first-tier or even interpreted mode. Eventually the function will be recompiled by both tiers with the new assumption -- two implementing classes.

So, in the case of deoptimization, it can be a win to keep the first-tier code to fall-back to.

__-X-__ · on Sept 15, 2016

yes that's the idea ! but your example is wrong because c1 never compile a branch that was not executed before.

billsmithaustin · on Sept 15, 2016

I think you meant the machine code. The Java compiler produces bytecode from Java source. The C1 and C2 bytecode compilers in the JVM convert bytecode to machine code.

Yes, I agree if C2 recompiles something C1 already compiled, the C1-compiled machine code should be freed from the codecache.

__-X-__ · on Sept 15, 2016

It will be freed but when c2 has emitted the new machine code, you can still have code on the stack of the user threads that use the code compiled by c1. When no more threads will use the c1 generated code, it will be swept from the code cache.

sambe · on Sept 15, 2016

Code is sometimes un-optimised, presumably the best option is to go back to lightly-compiled.

manishsharan · on Sept 15, 2016

Yikes ! Can somebody with deep AWS knowledge comment on how this impacts aws lambda (java) and AWS Elastic beanstalk applications? what configuration is recommended?

billsmithaustin · on Sept 15, 2016

To be clear, the Java 8 default is much higher than Java 7's. If you stick with the default, you will probably be fine.