Hacker News new | past | comments | ask | show | jobs | submit login

“I admit I don’t understand these results. There’s clearly nothing in the runtime itself that prevents these types of speeds.” Oh, there is. The default Java serialization is sort of like “pickle” module in Python - if you are familiar. It will deal with pretty much anything you throw at it, figuring the data structures and offsets to serialize or parse at runtime. More efficient methods trade universality for speed, where the offsets and calls to read/write the parts of the structure are determined in advance. Also, hard to say without source code but there is a high chance even more efficient methods like Protobuf create a lot of Java objects and that kills cache locality. With Java, you have to go out of your way to maintain good cache locality because you give up control over memory layout for automatic memory management.



>With Java, you have to go out of your way to maintain good cache locality because you give up control over memory layout for automatic memory management.

There is a shadowy cult in a hidden corner of the Java community, an heresy to many, only followed by a handful of obnoxious zealots inspired by the dark ages of Ada 83, C, or even assembly, who take pride in creating Java programs that only allocate a finite amount of objects regardless of how long you run them for, and to which the "new" keyword is a taboo which avoidable use is assimilated to blasphemy.

As a member of this sect, in a few cases of presenting some of our programs on some laptop, I've had dumbfounded observers looking around the laptop for the network cable linking it to the server they thought it must have been running on.


Ah, so you're one of the six Epsilon GC users :-)


this is pretty much a necessity sometimes. I was writing an audio synthesizer in c# and this is exactly what I had to do. Allocate whatever I need up front and ban the new keyword from then on.

Not that different to c++ had I chosen that instead.


To be fair, working with audio is great in a way that you usually know the exact size of the buffer you need, so it is easy to either stackalloc it or NativeMemory.Alloc the pointer and wrap it in a span, completely bypassing GC. The downside is there aren’t that many libraries that are up-to-date, but working with system APIs directly or e.g. PortAudio bindings is also an option.


> Also, hard to say without source code but there is a high chance even more efficient methods like Protobuf create a lot of Java objects and that kills cache locality

I don’t think this can be claimed that easily without more info, generational GCs work pretty much like an arena allocator, with very good cache locality (think of an ArrayList getting filled with objects that are continuously allocated in short order. The objects will be right next to each other, in memory). If the objects are short-lived, they can be similarly cheap to stack allocation (thread-local allocation buffers that just bumping pointers).


GC pressure is another factor. Even in the trivial ownership case, gc:ing in a GB/s allocation environment comes at a cost.


Adding Guava testlib's GcFinalization.awaitFullGc() before a benchmark run and -XX:+UseParallelGC, I saw the runtimes decrease by 30s in Bench_Fury_Ordinal and 20s in Bench_ObjectOutputStream. Ideally you would run using JMH to avoid jit warmup issues, previous runs, etc from polluting your results.


Did you dial up to run for 1 billion items?

In general I'm not a big fan of JMH for testing sustained I/O scenarios, as CPU, OS and storage behavior are extremely relevant, and JMH tends to interfere with access patterns.


Nope, I used the default in your github repository (10_000_000). From a quick profile it looked like previous benchmark allocations where crossing into later runs, who were then penalized unfairly, so I made those small adjustments.


The article does have source code.

I don’t think any of the examples use Java’s Serializable. The first attempt reads shorts and utf8 directly from the stream.


ObjectInputStream is one of the faster stream options tested.


True, but none of the slow methods in the article involve this.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: