Is there a good decision flowchart on how to chose GC for a particular task/appl...

exabrial · on March 17, 2023

Always start with the question, do you need to optimize? :) Most likely, Probably not! Java is _fast_, and the JVM is pretty good at what it does; 95% of the time merely checking for GC thrashing is all that’s needed.

Otherwise, start with G1 and get your Xmx value in the ballpark. VisualVm can help you determine if you're thrashing. Are you GCing like 10+ times per second? Keep an eye on it. If you start hitting giant pause times and 50+ collections a second, you've got problems :) Increase Xmx. (and no, please don't set Xms = Xmx).

If you have issues past that, it’s not the garbage collector that needs help; the next step is to audit your code base. Bad code makes any Garage Collector in any language to misbehave.

For instance, are you `select *`ing from a table then using the Java streams api to filter rows back from a database? That will cause GC issues :) fix that first.

So now if you've got to this point and you still need to optimize, what we've done is just run through the different collectors under load. One of our JVMs is a message broker (ActiveMQ 5.16.x) and we push a couple thousand messages per second through it. We found that Shenandoah actually improved latency for our particular use case, which was more important that throughput and outright performance.

Oh, and if your application and usecase is _extremely_ sensitive to latency, forget everything I wrote and contact Azul systems about their Prime collector. They're pretty awesome folks.

Szpadel · on March 17, 2023

do you mind explaining to someone that does not do any java dev, but operates some java servers (like elasticsearch, etc) what's wrong with Xms = Xmx?

And what is better strategy to set it?

natdempk · on March 17, 2023

This Stack Overflow answer is pretty decent: https://stackoverflow.com/a/23284897

I think basically the argument is that by setting the min bound lower, you allow the JVM to shrink the heap. This could maybe be beneficial towards reducing pause time because the JVM has less memory to manage overall. That being said, that SO answer also mentions:

> Sometimes this behavior doesn't give you the performance benefits you'd expect and in those cases it's best to set mx == ms.

I've also seen apps configured this way professionally for similar reasons. You might imagine some app that leads to the JVM pathologically trimming the heap in a way that isn't desirable and thus impacts performance in some subtle way, etc. The answer with a lot of this stuff is usually try both ways, measure, see if you can observe a meaningful difference for your apps/workload for typical/peak traffic.

Background: I've worked on Java apps for a few years at reasonable scale and worked on GC pressure issues in that time.

exabrial · on March 17, 2023

where to start... first the other commenter is correct. If you need to run to the connivence store that's 250 yards away, what is more efficient? Firing up a 7,100hp Union Pacific Diesel locomotive and plowing through everything/everyone to get there, or taking a electric scooter? A smaller heap means less to scan, meaning less work, meaning lower latency. Now G1GC removes some of this penalty due to its algorithm, but in general, less memory is less memory scanned, meaning better best-case performance from a GC. Let that Adaptive GC Boundary do it's thing.

and now, rant time: well, Xms allegedly improves start times. Is that really important? No. Is that really true anyway? Not really. Xms hides problems... Yes. Give that memory to the OS. Let it fill it up with things like disk cache or COW artifacts. Xms is an attempt to help people that aren't planning properly or testing. Yes. Instead, Test your systems under full load with Xms off and adjust, measure, experiment, repeat.

Elastic is sort of a special case because it's a 'database', but, I'd rather know the minimum Xmx my system actually needs by experimentation, and you can't find that with Xms enabled. And even then, I don't see MySQL allocating 100% of its InnoDb bufferpools at startup.... :)

lkorinth · on March 17, 2023

You scan the live set (which most often is not dependent on the heap size). Bigger heap most often means better performance. Setting Xms to Xmx is good and valid if you know how much memory you need (which often is the case).

  If you prefer high throughput, then relax the pause-time goal by using -XX:MaxGCPauseMillis or provide a larger heap. If latency is the main requirement, then modify the pause-time target. Avoid limiting the young generation size to particular values by using options like -Xmn, -XX:NewRatio and others because the young generation size is the main means for G1 to allow it to meet the pause-time. Setting the young generation size to a single value overrides and practically disables pause-time control. [1]

  Setting -Xms and -Xmx to the same value increases predictability by removing the most important sizing decision from the virtual machine. However, the virtual machine is then unable to compensate if you make a poor choice. [1]

[1] https://docs.oracle.com/en/java/javase/19/gctuning/introduct...

mikmoila · on March 17, 2023

>>You scan the live set (which most often is not dependent on the heap size).

This was an eye-opener for me; That GCs performance do not depend on the size of garbage but on the size of the live objects, finalizers excluded.

exabrial · on March 17, 2023

> However, the virtual machine is then unable to compensate if you make a poor choice.

> Test your systems under full load with Xms off and adjust, measure, experiment, repeat.

precisely what I'm saying.

zht · on March 17, 2023

honestly none of these sound like compelling reasons to not just set -Xms to Xmx

agilob · on March 17, 2023

If you run it in a container the best advice it to leave things with default options. JVM has ergonomic GC options and with adaptive heap sizing. 95% of time you dont need to change anything at all.

pojzon · on March 18, 2023

And if you do any IO with bytebuffers or library that does that -> you simply gonna have a bad time with native memory usage which is 10x as convoluted as selecting GC and carring only about the heap.

And best part, every month there is a new bug released in regards to NMT reporting stupid memory usage that is hard to track coz regular metrics dont expose it.

So you have to enable NMT which in most cases is a straight 5-10% performance degradation.

And for latency use ZGC or Shanondoah.

And dont forget 50 jvm flags to tweak memory usage of various parts of memory that can negatively impact your production like caches, symbol tables, metaspace, thread caches, buffer caches and other.

God I hate that so much. Just let me set one param for memory and lets be over with it.

diamondlovesyou · on March 18, 2023

Sadly, nobody can run from memory management.

vips7L · on March 17, 2023

Oracle has an overview here [0]. But in general I would say unless you're facing an issue just to let the JVM pick the GC and only really tune -Xmx

[0] https://docs.oracle.com/en/java/javase/18/gctuning/introduct...

colechristensen · on March 17, 2023

Yup, every once in a while use the various available tools to watch your memory usage patterns and how GC runs. If it looks fine continue to leave it as is.

brucethemoose2 · on March 17, 2023

I think generational ZGC is going to be the "sane default" soon.

marginalia_nu · on March 17, 2023

Eh, there's probably always going to be a different answer depending on whether you prioritize latency vs throughput.

popfalushi · on March 17, 2023

Default gc depends on how many cpus and ram vm has. With 1 core it is always serialgc, for example. 2 cores and less than 4 gb - concurrent mark and sweep IIRC. G1GC starts a bit later.

agilob · on March 17, 2023

This is so wrong:

>With 1 core it is always serialgc

Even with 1 CPU ParallelGC has lower latencies than SerialGC on 1 CPU. SerialGC will be better on environment with limitations on number of threads, not number of CPUs.

> 2 cores and less than 4 gb - concurrent mark and sweep IIRC

CMS has been deprecated in 11 and removed later in a non-LTS release. JVM ergonomic will automatically turn on >G1GC< when it detects JVM has at least 2 CPUs and 1792Mb of RAM (not heap, memory in total). When either or both numbers are lower then ParallelGC is enabled automatically.

hashmash · on March 18, 2023

The serial and parallel GCs are best used for applications that care more about throughput than latency (pauses). If you only have one CPU core, running multiple threads isn't going to speed up the GC cycle. The parallel collector just adds overhead due to context switching among threads.

brucethemoose2 · on March 17, 2023

Yeah but you dont necessarily want to use those defaults. They strike me as kind of archaic.

For low resource use cases, you probably want Graal AOT above all else... which means G1GC or serial, I think.

agilob · on March 17, 2023

GraalAOT, do you mean native image or C2 compiler? Neither of these tuning is applicable in a discussion about GCs

lkorinth · on March 17, 2023

CMS was removed in 14.

re-thc · on March 17, 2023

Would be good to have more benchmarks and articles on G1 vs ZGC once that happens.

the8472 · on March 17, 2023

single-core, latency doesn't matter, low footprint: serial

throughput: parallel or G1

balance between latency, footprint and throughput: G1

latency more important than throughput or footprint: ZGC or shenandoah

missiles and HFT: Epsilon