I’m curious how the JVM efficiently allocates many small objects. How does it avoid memory fragmentation? Or does it request a large block of memory from the OS at the start? (last q is probably easily googled)
The JVM allocates small objects by incrementing a pointer in the “young generation” region. The GC later moves all objects that are still alive from that region to a different region. The “young” region can then be reused from scratch. The moving of objects effectively defragments (compacts) that region of the heap. Modern GCs use multiple per-thread and/or per-core regions, i.e. there are generally multiple “young” regions, not just a single one. Memory is allocated from the OS in large chunks.
There is intermediate fragmentation due to dead-but-not-yet-collected objects. Together with the use of different generational regions, GC languages thus require more memory (a rule of thumb is twice the memory of a non-GC program), but memory is cheap, and not having to reference-count and deallocate each object individually can conversely have performance benefits.
Ok thanks, that helped clear some things up. I didn’t realize generational GC reuses the same contiguous memory blocks for the young regions. Makes sense from a fragmentation and resident page locality perspective.
Do you know if the early generation region settings can be tweaked (alignment, size, number of threads/regions)? I’m wondering what happens if you “overflow” these areas by generating too many objects
There are many parameters you can tweak, for sure. Overflowing an area probably triggers an immediate GC run, and if that doesn’t free up enough space, an additional “young” region is used. You only need to specify a high enough maximum value for the total memory consumption of all generations.
One situation you can run into is if you generate and quickly drop objects faster than the GC can collect them, or rather, the process spends most of its time with GC rather than with actual program execution, then after a certain while an exception will be raised. That situation almost always indicates a bug in your program, and the exception helps to see the cause of the program stalling. Of course that behavior is also configurable.
The latest GCs are _fast_. I benchmarked ZGC and IIRC it ate up something like 4GB/s of garbage (24 threads doing nothing but allocating) with about 2 milliseconds of stop-the-world pause time over around 3 minutes of runtime.
Note that benchmarking GCs properly is really hard. Changes in size distribution, tree shape, and lifespan can lead to drastically different results (and to make matters worse, the type of code that is easiest to write as a benchmark tends to be types of code that GC can handle really well).
Looks like it’s complicated. It preallocates memory on the heap, but on Linux this would be virtual memory whose pages have to loaded later. Also more memory will have be requested if the initial amount is insufficient. And this doesn’t include any stack or swap resources the JVM might use.