Hacker News new | past | comments | ask | show | jobs | submit login
Peter Lawrey Describes Petabyte JVMs (infoq.com)
117 points by ancatrusca on March 11, 2015 | hide | past | favorite | 21 comments



There are some interesting challenging when scaling this large with any system.

Peter referenced some of Azul System's stuff and their custom GC.

Cliff Click from Azul has done some great Google Tech Talks about JVM development and scaling JVM's out to really big systems.[1][2]

[1] A JVM Does That? - https://www.youtube.com/watch?v=uL2D3qzHtqY

[2] Java on a 1000 Cores - https://www.youtube.com/watch?v=5uljtqyBLxI


Peter also has some great off-heap libraries. It is relatively easy to bypass GC completely.


Here's Peter's HugeCollections repo[1].

I expected to see some native code being called via JNI, but it's all done in Java.

[1] https://github.com/peter-lawrey/HugeCollections


Well, the native code is replaced by the use of sun.misc.Unsafe to do the off heap allocation and access (look in io & storage classes in https://github.com/OpenHFT/Java-Lang/tree/master/lang/src/ma... ).

I wonder what this sort of library will look like when Panama is done, assuming it provides the sort of structured off heap storage support I'm hoping for.


Looks like this is the repo with more recent activity .. https://github.com/OpenHFT/HugeCollections


In fact, "to bypass GC completely" is impossible because you still need on-heap objects to access the off-heap stuffs. Note, there is always "no magic".


It is possible to program in java without allocating GC at all. (if you do not count class loader and some other system stuff). Use Unsafe and primitive (on stack) variables. Peter has some nice demos on his blog.


but the on-heap references you keep are only going to be much smaller and thus, alleviate a lot of gc. Offheap storage tend to be things like caching, and caching in java does incur gc costs that you avoid with off-heap stores. It's not a bad solution.


Azul's practices in Java/JVM are really great but finally down to its commercial focuses.

Java has many improvements in recent years. Most of high-performance Java sides I have summarized in one of my unofficially published project:

http://land-z.org

The problem is that the language evolution of Java itself is still in the style of one company and the hacking to deep Java is expertise-based.

> Work is already underway for atomic value types, classdynamic, a new FFI layer, and a next-gen, fully programmable JIT (Graal).

most are already in other languages. Do not mention Java 9, can you tell me which of them will come to Java 10? Is the year 2018 OK? That's, people said "why Java was slow".


> Azul's practices in Java/JVM are really great but finally down to its commercial focuses.

Just like any other programming language that tried to bring research into the mainstream.

If it wasn't for the money invested by companies into C++, Java, .NET and JavaScript ecosystems, we would all be doing C most likely.

As all the better alternatives to C on its day, died from lack of investment.

As for waiting.

It is no different from C++ developers eagerly waiting fro concepts lite and modules, which might be available across major compilers (not only desktop systems) around 2020 if all goes well.

Besides how many companies in the world have to worry about petabyte JVMs? Very few.


This is really cool and awesome, but after seeing the slow pace of JVM development over the last 6 years I really despair of doing major systems/database work in Java. It just feels like we are 10+ years ahead of where the VM is at.


> the slow pace of JVM development

Just in the past couple of years we've had a terrific new GC and the most powerful profiler I've ever seen (Java Flight Recorder) added to the JVM (the latter, unfortunately, not to OpenJDK yet). In a year, we'll have modules, JIT caching/AOT (possibly) and runtime control over JIT optimizations[1]. Work is already underway for atomic value types, classdynamic, a new FFI layer, and a next-gen, fully programmable JIT (Graal). Many of those enhancements are not only ahead of other production quality offerings, but truly state of the art if not groundbreaking. Even when I look hard I can't find another runtime that's less than 5+ years behind (and that's being generous).

[1]: http://openjdk.java.net/jeps/165


> we've had a terrific new GC

I assume you're referring to G1. AFAIK the first paper on G1 was published in 2004 when they already had working code — in the past couple of years it has merely stopped crashing on a regular basis. I remember times during JDK 8 development when they would fix one or two crashing bugs every week. In addition it has a 200ms stop-the-world pause that you can't get around (you can try to lower it but not into the 10ms area).

> and the most powerful profiler I've ever seen

jfr has been in JRockit for a long time. It took them four years to port it to Oracle JDK. I doubt it will ever be part of OpenJDK and assume you'll always need an Oracle Java support license if you want to run it in production. I dare you to get a quote for an Oracle Java support license.


What's your point? Are those technologies less groundbreaking or not years ahead of anyone else because they've been developed for years?

My experience with G1 has been a higher CPU consumption overall, but a very significant reduction in worst-case STW pause times. It works especially well for "session scope" objects: objects that are born together and die together, but may live for a relatively long time (seconds to minutes). I am not aware of any other runtime with GCs that even come close to the JVM's.

I am also not aware of any other profiler with such depth of reporting and such low overhead as JFR.

The work on Graal will continue for years, but when it is ready, it will be way ahead of the curve.

Innovation in the JVM is vibrant, and we're getting new, extremely powerful features at a good pace. The fact that each of those features takes years to implement well only proves the emphasis on big advancements (although minor advancements, like the JIT control feature in Java 9, are also made regularly).


> What's your point?

JVM development is slow. Things take years to a decade. If we want to have a GC that's better than G1 a decade from now we have to start now. G1 does not have an order of magnitude improvement in stw pauses in it. It may be good (or acceptable depending on where you stand) today but a decade from now we need something better.

The best features in the world are useless if you can't use them because of prohibitive licensing cost.


If you mean development latency, then you're right (throughput is quite high). And there are people working on next-gen GCs and next-gen JITs already.


How do you define "major"? By "10+ years ahead", are you referring to techniques for addressing very large datasets in memory?


This should change with Java 10. Project Jigsaw looks to be a pretty major change to the way the JVM will work scaling from very small to very big. Remains to be seen how well it pans out.


What surprised me is that when scaling vertically they stuck with Intel. Have IBM/Oracle really become that irrelevant?


I think the idea is to bring this kind of scale to a wider audience at a lower price point.


tl;dr: our huge, highly-dynamic financial models are really expensive to run. Please use your own resources and consider helping us get the Java ecosystem up to a state where we could do something like this without reinventing Java inside our niche. =)




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: