“I admit I don’t understand these results. There’s clearly nothing in the runtim...

jffhn · 2024-09-22T16:10:49 1727021449

>With Java, you have to go out of your way to maintain good cache locality because you give up control over memory layout for automatic memory management.

There is a shadowy cult in a hidden corner of the Java community, an heresy to many, only followed by a handful of obnoxious zealots inspired by the dark ages of Ada 83, C, or even assembly, who take pride in creating Java programs that only allocate a finite amount of objects regardless of how long you run them for, and to which the "new" keyword is a taboo which avoidable use is assimilated to blasphemy.

As a member of this sect, in a few cases of presenting some of our programs on some laptop, I've had dumbfounded observers looking around the laptop for the network cable linking it to the server they thought it must have been running on.

gred · 2024-09-22T18:20:03 1727029203

Ah, so you're one of the six Epsilon GC users :-)

vrighter · 2024-09-23T06:42:13 1727073733

this is pretty much a necessity sometimes. I was writing an audio synthesizer in c# and this is exactly what I had to do. Allocate whatever I need up front and ban the new keyword from then on.

Not that different to c++ had I chosen that instead.

neonsunset · 2024-09-23T10:09:21 1727086161

To be fair, working with audio is great in a way that you usually know the exact size of the buffer you need, so it is easy to either stackalloc it or NativeMemory.Alloc the pointer and wrap it in a span, completely bypassing GC. The downside is there aren’t that many libraries that are up-to-date, but working with system APIs directly or e.g. PortAudio bindings is also an option.

kaba0 · 2024-09-22T14:52:05 1727016725

> Also, hard to say without source code but there is a high chance even more efficient methods like Protobuf create a lot of Java objects and that kills cache locality

I don’t think this can be claimed that easily without more info, generational GCs work pretty much like an arena allocator, with very good cache locality (think of an ArrayList getting filled with objects that are continuously allocated in short order. The objects will be right next to each other, in memory). If the objects are short-lived, they can be similarly cheap to stack allocation (thread-local allocation buffers that just bumping pointers).

marginalia_nu · 2024-09-22T15:32:43 1727019163

GC pressure is another factor. Even in the trivial ownership case, gc:ing in a GB/s allocation environment comes at a cost.

NovaX · 2024-09-22T20:21:01 1727036461

Adding Guava testlib's GcFinalization.awaitFullGc() before a benchmark run and -XX:+UseParallelGC, I saw the runtimes decrease by 30s in Bench_Fury_Ordinal and 20s in Bench_ObjectOutputStream. Ideally you would run using JMH to avoid jit warmup issues, previous runs, etc from polluting your results.

marginalia_nu · 2024-09-22T20:42:27 1727037747

Did you dial up to run for 1 billion items?

In general I'm not a big fan of JMH for testing sustained I/O scenarios, as CPU, OS and storage behavior are extremely relevant, and JMH tends to interfere with access patterns.

NovaX · 2024-09-22T21:24:59 1727040299

Nope, I used the default in your github repository (10_000_000). From a quick profile it looked like previous benchmark allocations where crossing into later runs, who were then penalized unfairly, so I made those small adjustments.

hedora · 2024-09-22T14:27:45 1727015265

The article does have source code.

I don’t think any of the examples use Java’s Serializable. The first attempt reads shorts and utf8 directly from the stream.

marginalia_nu · 2024-09-22T14:29:02 1727015342

ObjectInputStream is one of the faster stream options tested.

jbellis · 2024-09-22T14:27:28 1727015248

True, but none of the slow methods in the article involve this.