Hacker News new | past | comments | ask | show | jobs | submit login
Java garbage collection can be really slow (jvns.ca)
161 points by ingve on April 23, 2016 | hide | past | favorite | 181 comments



Probably a book needs to be written as in "Pragmatic Garbage Collection" summarizing some good practices to avoid surprises as the author of the article encountered. Having used Java since its creation and other GCed languages, I would summarize them as follows:

- avoid allocating objects on the heap which you do not have to allocate. The less fresh allocations you have, the less the GC has to do. That does not mean you should write ugly and complex code, but if the tool described in the article was for example grep-like, then one should not have to allocate each line read separately on the heap just to discard it. If possible use a buffer for reading in, if the io libraries allow it.

- generational GCs try to work around this a bit, as the youngest generation is collected very quickly, assuming the majority of the objects is already "dead" when it happens, only the "survivors" are copied to older generations. Make sure that the youngest generation is large enough, that this assumption is true and only objects are promoted to older generations which indeed have a longer lifetime.

- language/library design makes a huge difference how much pressure there is on the GC system. Less heap allocations help, also languages, which try not to create too complex heap layouts. In Java, an array of objects means an array of pointers to objects which could be scattered around the heap, while in Go you can have an array of structs which is one contiguous block of memory which drastically reduces heap complexity (but of course, is more effort to reallocate for growing).

- good library design can bring a lot of efficiency. At some point in time, just opening a file in Java would create several separate objects which referred to each other (a buffered reader which points to the file object...). My impression is, "modern" Java libraries too often create even larger object chains for a single task. This can add to the GC pressure.

Of course, all these practices can be used equally "well" to bring down a program with manual allocation to a crawl. So in summary I am a strong proponend of GC, but one needs to be aware of at least the performance tradeoffs different factorings of one program can bring. Modern GCs are increadible fast, but that is not a magic property.


One problem with Java is, for the time being, the lack of proper value types.

Had a language like Eiffel, Oberon dialects or Modula-3 taken its role in the industry, I bet we wouldn't be having this constant discussions of GC vs manual, in terms of performance.

Another issue, is that many developers even in languages with GC and value types, tend to be "new() happy" sometimes leading to designs that are very hard to refactor when the the need arises, given the differences in the type system between reference and value types.

Eiffel is probably the only one I can remember, where the difference is a simple attribute.


Does Haskell also qualify?

https://downloads.haskell.org/~ghc/latest/docs/html/users_gu...

Haskell doesn't quite have a distinction between reference and value types, but there are boxed and unboxed values, which I think fits the bill for the purpose of this discussion.


> Haskell doesn't quite have a distinction between reference and value types, [...]

Perhaps difference between normal values, and values wrapped in eg IORef counts?


To a Haskeller, you could explain Java-style reference types by saying “it's an artificial limitation whereby every value is behind an IORef, and the value itself isn't a first-class entity”. But Haskell itself doesn't have this artificial limitation, and every type is a value type.


In Haskell, since it is a pure language, every type is a value type in the sense that you can pass around values without worrying that callees might overwrite your data.

However, these types (excepting unboxed types) can also be considered reference types since values are technically passed by reference. Boxed values are stored in thunks, and only accessed through these (at least conceptually). Thunks in turn have GC overhead.

Boxed values have even associated internal state, but there is not much control over it: the state of their evaluation.

So another viewpoint is that boxed Haskell types are just like Java reference types, but from the outside they seem immutable.


> However, these types (excepting unboxed types) can also be considered reference types since that is how values are passed.

This is an implementation detail, and abstractions shall never be conflated with their possible implementations. What matters to the user of the abstraction is that you're passing around (a computation[0] that evaluates to) the value itself, not some mutable entity whose current state is the value.

> So another viewpoint is that boxed Haskell types are just like Java reference types, but they are immutable.

This is wrong. In Java, all objects have a distinct identity, so, for instance, `new Complex(2,3) == new Complex(2,3)` evaluates to `false`, even if both objects represent the complex number `2+3i`.

[0] Since Haskell is non-strict.


This is an implementation detail, and abstractions shall never be conflated with their possible implementations

We're talking about performance here, right? So unfortunately this statement is totally untrue in that context.

Java world wants value types largely to avoid pointer chasing that hurts cpu cache effectiveness. Haskell suffers the same issue, then. You can make what are effectively value types in Java as long as you don't use == to compare but rather .equals, make all the fields final, override toString/equals/hashcode etc, and more sensible JVM targeting languages of course convert == into calls to .equals by default.


> We're talking about performance here, right? So unfortunately this statement is totally untrue in that context.

You don't need implementation details to discuss performance. Abstractions can come with a cost model. (C++'s standard library concepts are a great example of this.) And Haskell's cost model says that values can be passed around in O(1) time.

> Java world wants value types largely to avoid pointer chasing that hurts cpu cache effectiveness. Haskell suffers the same issue, then.

That's not the whole story. When the physical identity of an object no longer matters (note that immutability alone isn't enough), the language implementation can, without programmer intervention:

(0) Merge several small objects into a single large object. (Laziness kind of gets in the way, though. ML fares better in this regard.)

(1) Use hash consing or the flyweight pattern more aggressively.

> You can make what are effectively value types in Java as long as you don't use == to compare but rather .equals, make all the fields final, override toString/equals/hashcode etc, and more sensible JVM targeting languages of course convert == into calls to .equals by default.

But the JVM itself is completely unaware that you intend to use a class as a value type, and thus can't automatically apply optimizations that are no-brainers in runtime systems for languages where all types are value types.

Also, `.equals()`'s type is broken: it allows you to compare any pair of `Object`s, but it should only allow you to compare objects of the same class.


I know what optimisations identity-less aggregates allow. Objects can be passed around in O(1) time in any language that uses references. It's not the issue. Indeed passing value types is O(n) in the size of the value type, so that's a loss. The goal is to avoid data dependent loads (i.e. pointer dereferences) by flattening/inlining data structures.

As you point out, Haskell could in theory merge things together and lay out memory more effectively, but in theory so could Java with e.g. interprocedural escape analysis. But in practice there's a limit to what automatic compiler optimisations can do - learning this the hard way is a big part of the history of functional languages (like, how many functional language runtimes automatically parallelise apps with real speedups?). Java sometimes does convert object allocations into value types via the scalarisation optimisations, but not always, hence the desire to add it to the language.


Passing around copies is O(n) in the size of the copy. This has nothing to do with call-by-value, and all with the retardedness of having to defensively clone objects to cope with arbitrary mutation.


> And Haskell's cost model says that values can be passed around in O(1) time.

I am not sure of that. The compiler is free to inline expressions, or otherwise evaluate them multiple times.


What you say is not incorrect. But you totally miss the context of the discussion.

And you switch sides just to disagree with your last quote.


Yeah as I said it's a viewpoint. The viewpoint I meant is the technical (and performance-centric) one, not the mathematical one.


> The viewpoint I meant is the technical

My objection was also a technical one: Haskell doesn't give you access to the physical identity of any runtime object that isn't a reference cell (`IORef`, `STRef`, `MVar`, `TVar`, etc.).

> (and performance-centric) one

The ability to pass around arbitrarily complicated values in O(1) time is an intrinsic part of Haskell's cost model. This is the most natural thing in the world, unless you've lived all your life subordinating values to object identities, in which case, yes, non-destructively transferring a value from one object identity to another (aka “deep cloning”) might be an arbitrarily expensive operation.


One thing I like about Go is its strong Oberon heritage, picking up where those languages left.


That is what attracted me initially to it, but then I got disappointed with the overall direction the language design was going.

I am more of a Swift/Rust guy than Go, in terms of features.

Even Oberon eventually evolved into Active Oberon and Component Pascal variants, both more feature rich than Go.

To be honest, Niklaus Wirth's latest design, Oberon-07 is even more minimalist than Oberon itself.

EDIT: Typo


>then I got disappointed with the overall direction the language design was going.

Can you elaborate? Thanks.


For me the fact that Go is a descent of Oberon-2 and Limbo is quite interesting, but there are several features that a modern language should have that aren't present in Go and never will be.

Hence why I rather see the appeal of Go as a way to attract developers that would otherwise use C, to make use of a more safer programming language.

As many turn to C, just because they don't know other AOT compiled languages well, not because they really need any C special feature.

Regardless of the discussion regarding if it is a systems programming language or not, I think it can be, given its lineage. It only needs someone to get the bootstraped version (1.6) write a bare metal runtime and then it would be proven. Maybe a nice idea for someone looking for a PhD thesis in OS area.

Me, I would rather make use of a .NET, JVM or ML influenced language as those have type systems more of my liking.


Considering that Wirth's latest Oberup update took features away, I'm not sure whether he and Pike et al. would really agree about that... But yeah, it's an interesting language in that family, although I'd still much rather have a modern Modula-3 system...


As I only have used Modula-2 and Go, what are the features of Modula-3 missing in Go?


Quite a few:

- Enumerated types

- Enumerations as array indexes

- Generics

- Classic OO with inheritance

- Untraced references in unsafe packages

- Unsafe packages for low level systems programming

- Exceptions

- Reference parameters

- Bit packing

- Since Modula-3 was a system programming language for SPIN OS, the runtime library was richer, including GUI components

There a few other features.


Man I loved the Eiffel book. When I think about the difference between invariants and pre/post-condition contracts, vs the "bean" anti-pattern that industry went with, I want to vomit.


Could you elaborate on what about those three languages? Are you talking about copying GCs vs mark and sweep? Thanks.


Java is one of the few languages that only has reference types.

All the languages that I referenced have value types in the same lineage as Algol derived languages.

This means that you can make use of the stack, global statics, structs of arrays, arrays of structs and so on.

The GC only comes into play when you make use of the heap, of course. Also the GC has a more rich API, given that besides Eiffel, the other languages are system programming languages. So you can let the GC know that certain areas aren't to be monitored or released right away (manual style).

So given that you have all the memory allocation techniques at your disposal the stress on the GC isn't as big as in Java's case.

But sadly none of those earned the hearts of the industry.

The closest you have to them are D (which deserves a better GC implementation) and the improvements coming to .NET Native via the Midori project.


But surely you could implement "global statics, structs of arrays, arrays of structs" in Java(classes with public members) as well even if it lacks value types(aside from primitives that is) no?


Yes, but you won't gain much given the prevalence of references.

You will need to decompose the classes across multiple arrays, thus leading to very hard to maintain code.


Thanks for the explanation. This makes me want to read up on Algol and Eiffel. Do you have any recommendations? It's funny I like Algol has finds it's way into a lot of discussions lately.


>Java is one of the few languages that only has reference types.

Uhm, aren't those "primitives"? Or did you mean user-definable value-types?


Thanks for writing this, I agree especially with the first point. Scott Oaks's "Java Performance" [1] does a good job of explaining the different GC's available in the JVM. He also goes into the many, many GC-related JVM settings you can tune. However, as he acknowledges, the default settings are often hard to improve upon. The reason why many programs display bad behavior under memory pressure is that they are not written with a clear understanding of Java's memory model. They allocate too many objects, or hold on to objects for too long even if they're no longer needed (e.g. "head retention" in Clojure).

As powerful as the JVM is, it can't magically fix your broken programs. Unfortunately many memory pressure problems remain hidden until you encounter production workloads. What I'd like to see most is practical advice on how to avoid these problems in the first place, how to debug them if they occur, and how to effectively test your code for leaks/memory bugs.

[1] http://www.amazon.com/Java-Performance-Definitive-Scott-Oaks...


This particular article isn't even about a Java problem. The author is just trying to use more memory than is actually available.

If the program is written in C++ instead, what'd happen is it'd keep allocating memory beyond the 4 gig limit she imposed on the JVM, until it hit swap and the entire machine bogged down and became slow, or until the kernel OOM killer randomly killed some other (possibly important) program on her desktop to try and make space for it.

If she tried to fix that with ulimit, then she'd get different behaviour - the program would die quickly without slowdown, but before actually using 4 gigabytes of heap, due to fragmentation.

In the latest Java release there's a flag that makes the JVM exit as soon as there's an OutOfMemoryError (or do a heap dump for diagnostics), and there's also -XX:+UseGCOverheadLimit which makes the JVM give up sooner if it's spending more than 98% of its time garbage collecting (i.e. it's reached the limit of its heap but not quite).


If the program is written in C++ instead, what'd happen is

I think it's worth pointing out that, if my experience with c++ vs c# translates at all to c++ vs Java, if the program was written in C++ instead it's memory footprint would have been somewhere between 1/10 and 1/4 of the Java version.


The thing you missed is that C++ doesn't allocate as many objects. For a start, a std::vector allocates exactly one "object", namely the underlying buffer. All the elements are placed in this contiguous memory. When the buffer is full it is reallocated.


Great comment.

I disagree about manual memory management though, you don't have to traverse the heap, unless you use a semiautomatic scheme like reference counts.

Complex heap structures often aren't a problem for manual memory management.

The scenarios that kill GC aren't ones you typically worry about in manually managed code.

GC frees you from some concerns and gives you much safety but makes performance harder and less predictable. For many programs this is a good trade off.


> "avoid allocating objects on the heap"

Sometimes I see this as a side effect of all of the available libraries for java. Abstraction gets to the point where it's not easy to predict the behavior of something you're using.

Like, you're using Tomcat, with CXF client libs, which...under the covers, uses HttpURLConnection. Your app works great with http, but you need to switch to https. Unknown to you, when you switch to https...your object count doubles. Because the design decision all the way down at the bottom was to spin up a new object (per connection) to handle an SSL handshake.


These are all great points. Unfortunately most of these issues manifest themselves with a delay. Which makes for very fun conversations between the dev team and the ops team.

My cynical self views garbage collection as technical debt for memory management. Sure, it's unfair because modern GCs will be way better at managing memory for medium complexity projects than any home-grown solution. But when the project gets complex—as many mature ones do—memory management which was so blissfully delegated to the GC becomes a sore issue. But by that time, it's in the context of a lot of complexity going on so it is not only harder to troubleshoot but harder to remediate.


Yup nice advices. The biggest trouble with GC is that there are so few absolute numbers and so the gotchas can be hard to understand for most people. Luckily most aren't usually affected by them.

Often time bad allocation practices will survive until the system is pushed hard and by that time it gets harder to change those things. Its much easier if you just have a sense of bad allocation at the start of the design. It gets specially bad with libraries doing heavy allocation and with the level of abstractions & dependencies that we usually have today you can get very bad allocation even if your own code is neat.


Enter threads. Now you cannot simply have a singleton built at the beginning. Maybe you can pool things in ThreadLocal, or maybe it's more complicated.

I'm eventually gonna have to learn me some Erlang, I fear.


This has probably nothing to do with GC tuning[1] or with Java's GC being slow (or any other GC), and most likely to do with either a bug in the program (a leak) or a misunderstanding of how the program uses memory. It's not the "GC ruining your day", but the GC not being able to fix a bug in your program and/or cram a 5 GB RAM usage into a 4 GB heap.

[1]: Which is relevant if you're trying to turn a 100ms pause into a 15ms pause, or get rid of the 2sec pause you get once every few hours.


There is either a leak: The application keeps pointers to a lot of objects that are actually never going to be used.

Or running out of memory: The application keeps pointers to objects that will be used later.

Both problems are solvable, you remove the pointers or change the algorithm respectively. (If you simply can't add more memory.)

The real hard problem is that the jvm takes a long time to report an OOM error. But it's not unique to java; Who have not seen servers that have become unresponsive in a low memory situation.


The problem is not just with the time it takes, but that most garbage collection algorithms are stop-the-world ( not sure if any of them are truly concurrent ). This can introduce correctness problems.

I used to work on a network management software that used ICMP polling to detect if network devices were down. We had a SEDA architecture, requests were put on a queue, timers were set and if the device did not respond within a timeout, we would mark the device as down.

Problem was, it so happened that in a high load system after we sent out the request, the garbage collector would kick in and take eons to return the system to running state. When the system returns, the timer events would fire and the handlers would note that the timeout has expired and mark the devices as down. The device could have responded in time to the requests but the system would not have detected it.

This is why I am weary of languages with mandatory garbage collection. I feel it should be a library in any serious systems language.


See https://news.ycombinator.com/item?id=11555017 Azul's Zing is pauseless, never has to do a stop-the-world collection which they say on a JVM takes about a second per GiB.

It has threads which concurrently collect as other threads mutate, uses clever VM tricks such as bulk operations with only one TLB invalidation (or at least they did that with an earlier version of the current collector, they couldn't get it into the mainline Linux kernel and now use a DLKM). It's the only non-toy currently maintained pauseless/concurrent GC that I know of.



If one has to worry about allocating or not allocating objects on the heap, what is the difference between worrying about memory management that way (and suffering memory consumption and performance because of the garbage collector), and doing memory management manually with alloca(3C) or malloc(3C) in C, and having pretty much guaranteed performance???


Well, when you use a GC, you don't have to figure out where to put the "free" calls. This sounds like a minor thing, but it lets you write code in a very different style (have you ever tried writing a functional program with explicit malloc and free?)

That said, there are other ways to avoid writing "free" than using a garbage collector. Regions (https://en.wikipedia.org/wiki/Region-based_memory_management) are faster at allocation than malloc (you just increment a pointer to allocate) and faster at freeing than a GC (you throw away the entire region when you're done with it). It seems tricky to base a general-purpose programming language around them, though.


> Well, when you use a GC, you don't have to figure out where to put the "free" calls.

That is precisely why I mentioned alloca(3C): it automatically frees the memory for you, if you do not want to do it yourself. From the Solaris / illumos alloca(3C) manual page:

  void *alloca(size_t size);

  The alloca() function allocates size bytes of space  in  the
  stack  frame  of  the  caller,  and returns a pointer to the
  allocated block. This temporary space is automatically freed
  when  the  caller  returns. If the allocated block is beyond
  the current stack limit, the  resulting  behavior  is  unde-
  fined.
> (have you ever tried writing a functional program with explicit malloc and free?)

I got my start on MOS 6502 / MOS 6510 / MC68000 assembler, so for me making malloc(3C) and free(3C) calls when programming in a functional style is completely normal. I have no problem with that whatsoever.


The part where behavior is undefined when you overflow the stack makes alloca difficult to use safely, but it is very nice when you can use it!

Did you write your 6502 code with closures, higher-order functions, and so on? My point is that can be hard to figure out when to free an object in this kind of environment, where a value can be captured by multiple closures and may not have a clear owner.


Then either use C and manually manage memory, or use ANSI common LISP, and no problem.


  If the allocated block is beyond
  the current stack limit, the  resulting  behavior  is  unde-
  fined.
And you even quoted the part that hints at why alloca(3C) is so rarely used in practice: you get a pointer result, but you have no way of knowing if it's actually safe to use it.


This should tell one whether the pointer returned by alloca(3C) is safe to use:

  #include <sys/time.h>
  #include <sys/resource.h>
  #include <stdio.h>

  int main(int argc, char *argv[])
  {
    struct rlimit limit;
    getrlimit(RLIMIT_STACK, &limit);

    printf("Stack start at %p, end at %p.\n", &argc, &argc - limit.rlim_cur);

    return(0);
  }

  % cc limit.c -o limit && ./limit
  Stack start at 0x7fff51094ba8, end at 0x7fff4f094ba8.
If the pointer returned by alloca(3C) is within those memory addresses, and does not exceed the bottom (in this example, $7fff4f094ba8), it is safe to use.


The difference is, that reviewing your memory allocation is a matter of program performance, not of program correctness. And you don't have to do it everywhere. Just make sure that in the hot spots of your program there is not excessive heap allocation. The benefit of GC is the correctness you get by never having to call free() and the drastic simplification of all program parts which do have to alloc memory. In C, for anything that does get allocated on the heap, you have to carefully observe who is in charge of deallocating it later in the programs lifetime and to make sure that there are no pointers to it left. GC will take this complexity out of your program, which can make a lot of code much simpler.


The thing is, the garbage collector often does not. Where I currently work, we have Java applications which use 300 GB of memory. To me as an assembler coder, that is so wrong on so many levels. If we were doing a finite element analysis with a 100 million by 100 million matrix calculation in order to compute the cavitation inside of a nuclear reactor (and even that software, written in Fortran, didn't need more than a few GB!), I could (perhaps) understand, but we're not doing anything even remotely close to that.


Right, but that problem ist mostly Java, and less so GC. Java pretty much forces you to allocate everything on the heap, there are no value types in Java, and most Java libraries tend to be complex and heap-heavy. We have here a web site run by tomcat which takes like 4 gb of real memory, crazy! (It also does not help, that Java strings use 16 bit characters)

GC languages require a certain amount of extra memory beyond the live memory as an overhead, but beyond that they should not have more impact on the memory footprint. In some use cases, this extra memory is not acceptable, but for reasonably sized applications, this is acceptable (not talking about Java here...) and the benefit is the correctness and often cleaner code (no complex protocols for object lifetime).


My experience writing an bidder for realtime ad exchanges in Java -- which was a mistake driven by our use of some 'legacy' code -- the numbers work out on average but not in the 90+th percentile. Heavy tuning of the GC yields better results but there's always something that comes along and causes burps. Throughput is usually on average fine -- but the latency spiky.

If your problem domain is fine with that, that's great. But I will never use Java for something latency sensitive again.

After I left that job I worked on the other side of RTB, on the exchanges themselves. They were both written in C++, and performance was reliable and awesome.

I would only use something like C++ or Rust for this purpose.


Were having a lot of success with our RTB app being written in Go. All we've done is tune the back pressure on the GC up so we trade off some memory for less GC time.


Was this done with the G1 garbage collector brought in by Java 7 if not this is worth a new test.

It fixes a lot of problems that used to be introduced, and as long as you let it run wild with memory, and you are doing some parallelized work like everything real world, you should not see this problem.

There should be either no more, or less, stop the world.

There are a few things you can do to also avoid this problem all together:

The biggest improvement in speed vs memory will come from not passing primitive as function parameters. When you are doing this, you are passing-by-value, not passing-by-reference. If you wrap a bunch of ints to a function that you are using you can save a lot of allocation cycles.

Another good change that you can make would be an object pool. There is a really good and fast implementation in JMonkeyEngine/LWJGL. They have a low level, thread happy, object pool.


Just a note, G1 probably still isn't ready for rock-solid production usage. E.g. bugs like this are still cropping up: https://bugs.openjdk.java.net/browse/JDK-8148175

That's a pretty scary bug. Who knows how stuff like that will trash your data if you aren't properly checksumming everything.

CMS and other GCs have the advantage of years of bug-squashing and tuning. G1 is exciting, but I wouldn't personally use it on anything important for quite some time.


That bug does not seem to effect Java 7, it only mentions 8u80 and 9. Still very bad though.


Yep, just more recent versions, and it's been fixed too. But if you watch the bugs that keep popping up for G1 you see stuff like this fairly regularly.

Of course, that's a totally unfair comparison: CMS has had a decade of bug squashing...I'm sure it had equally scary bugs when it was new. But that's the point. Don't use new, shiny GC's because they are still squishing bugs :)

(Sorry, preaching to the choir, I just get frustrated by everyone claiming G1 will solve all their problems without investigating potential downsides)


Actually CMS is only five years older than G1. The first published papers for G1 are from 2004. But yes, G1 has a long history of scary bugs. There was a time a few years ago where practically every week a crashing bug was fixed.

G1 addresses certain problem areas of CMS and replaces them with others. Honestly I hope in ten years from now we have better choices in HotSpot than CMS or G1 but right now it doesn't look like it (if you don't count Shenandoah with has other issues).

Having that said I have recently seen G1 performing exceptionally well in production: 120ms GC pauses with a 120 MB/s sustained allocation rate with basically default settings (apart from GC logging).


It seems semi-common to run HBase with G1GC now days.


See the logs he posted

[Full GC

    [PSYoungGen: 10752K->9707K(142848K)]

    [ParOldGen: 232384K->232244K(485888K)] 243136K->241951K(628736K)

    [PSPermGen: 3162K->3161K(21504K)],

    1,5265450 secs
]

-> parallel old

A full GC frees 140k of old 1045k of young memory. He's almost running out of memory with parallel old. He needs to run a way larger heap. G1 isn't going to be any better with a live set the size he has in combination with his max heap.

All this explained in Java Performance [1] http://www.amazon.com/Java-Performance-Charlie-Hunt/dp/01371...


You reminder me the time that I worked in a biology project 2007, I developed a program with Java to analyze DNA sequences, every execution it can easily handle over 3GB DNS sequence file without any issues, but for just curiously finding out how fast the process can be, I rewrote the program with C. The result is that C one is about 3 times faster than Java one, but the dev C one costed me about whole week (2 days for Java one).


That's why in practice Java programs tend to be faster if the application is nontrivial. By the time you've finished the C version you would have completed the initial version plus a few performance tuning cycles had you written it in Java.


We have a few healthcare customers on our portfolio, except for the usual HPC research use, everything else on their daily tools is a mix of R, Java and .NET tooling.


I actually wrote a blog on how I keep an eye on JVM garbage collecting at

http://gitsense.github.io/blog/realtime-process-monitoring.h...

My indexers are designed to automatically shutdown, if they are spending more than 30% of their time doing garbage collecting, in the last 10 minutes. If they shutdown on purpose, my background Perl script will restart them.

However, if they shutdown X number of times in a row, my Perl script won't restart them. Multiple consecutive shutdowns, usually means I'm pushing the system too hard and I'll need to tweak my indexers thread settings.


This is probably the wrong title for this article. In the article she gets right up to the available memory limit and keeps trying to allocate more, the VM is forced to work increasingly hard to find the memory requested. I wouldn't call that slow GC, just trying to use too much memory.


The old rule of thumb is that with GC one needs at least twice the memory than the max live heap. With little free space GC disables a lot of possible optimizations that bigger free space allows. Another Java-specific rule is that with heaps over 1-2GB one must think how to make data structures GC-friendly or how to split the application into separated processes. I guess the example program violated both of these.


Here is a pretty comprehensive article on the java garbage collector: https://plumbr.eu/handbook/garbage-collection-in-java that has helped me a lot.

It touches the basic aspects of garbage collecting, and dives into the different kinds of GC available for java at this time.


Correction, a specific GC implementation, of a specific Java implementation can be really slow.


On a specific, memory-leaking, use case.


You can write Java code to skip GC and manage memory by yourself. Like C++ http://www.mkyong.com/java/java-write-directly-to-memory/


It's nothing like C++.

In C++ you get automatically managed memory on the stack.

You get RAII and smart pointers to help with heap allocations/deallocations.

Most importantly, you get to _use_ the system's malloc implementation whilst you have to _implement_ your own malloc with the Java off-heap solution you suggest.


I'm not a C++ expert, RAII sounds to me is like a synchronous GC in C++ -- resource auto alloc and dealloc by it's lifetime (https://en.wikipedia.org/wiki/Resource_Acquisition_Is_Initia...), correct me if I am wrong.

Java, in another way, it has an asynchronous GC -- use different generations, and not release object right away just because out of scope.

Anyway, I am saying 'like C++', I meant the possible way C++ can manually manage the memory directly, learnt from what you mentioned, I should say: 'like C'. Thanks.


RAII is an automatic memory (more generally, resource) management scheme, but it isn't garbage collection. A garbage collector is a runtime component that reclaims unused storage. What RAII does is, at compile time, insert the appropriate resource-freeing calls at the right places.


The problem with Java off-heap designs compared to C is that many Java language features become useless (e.g. instance variables, polymorphism, ...) and you lose the standard library as well as almost every single third party library out there.

In my view, sun.misc.Unsafe is extremely unproductive to work with. But I will say that it may be the right solution for adding a small very spcialized feature to a larger Java application.


In my experience there is usually one or two collection which consume most memory. It is easy to solve 90% cases with a few simple optimalizations.


Same for Javascript, at least when aiming for perfect 60FPS, the only good garbage collection is the one that doesn't occur after init, and maybe when switching maps or whatnot. Even the 0.09ms in the article is way too much and means a skipped frame. Maybe a way to think of it is to treat GC like optional automatic destructors, which should get called (more or less) when you want them to, not as something you just "don't have to think about" (if you run something in a loop and need it to be silk smooth, that is).

Incidentally, this is great: https://www.mozilla.org/en-US/firefox/46.0beta/releasenotes/

> Allocation and garbage collection pause profiling in the performance panel


> Even the 0.09ms in the article is way too much and means a skipped frame.

He wrote 0.09 seconds. In any case, if you're generating megabytes to gigabytes of garbage in a single frame you probably deserve a GC pause.


Its very hard to control garbage. So many operations have unavoidable garbage side effects, especially involving immutable arguments.


A good GCed language should give you good controls about garbage generation. If you have not these controls, the language is to blame, not the concept of GC. But indeed, I think one of the strongest disadvantages of Java is that it does not give you much control about garbage generation.


What languages give you more control over GC than Java? My only experience is Java and Go. Go gives you literally one knob, whereas Java allows you to tune MANY factors on top of picking a collector.


I was talking about garbage generation. For best performance, you want to control heap allocation and the memory layout. Java gives you no controls there. All objects in Java are heap allocated, you can reference them only by pointer. In Go you can have value-types. An array in Go can be a block of structures, while in Java it would have pointers to separately allocated object. Also, if you have objects as member variables in a Java object, they cannot be part of that object, but need to be separate objects on the heap. This limits quite a few optimizations.


Modula-3, just as one possible example.

Sadly the whole set of DEC, Compaq, HP acquisition processes killed it.


Oops, sorry. I did write ms, but meant seconds. Even 9ms would be too much, but 90ms would be crazy in a game.

Though megabytes of garbage are easy to "achieve" if you don't pool objects, call functions with temporary objects as parameters or return values, etc. It really just depends on how much you do that way, which can easily be a way that works fine during execution, or even seems really elegant, but bites you in the ass at GC time. Especially since 60 frames a second doesn't mean you're not running some other loop for the logic more often than that, and it's not like you ever have the full 16ms for rendering and logic either.

"Silky smooth" to me means no dropped frames, none, once the game loop is running and the JIT warmed up. A browser might never be able or willing to guarantee that, but even a few megabytes to GC every 10 seconds means a guaranteed hiccup. Yeah, you can't avoid outside influences, but you can avoid your code dropping frames even in a vacuum. Nothing more, nothing less, and right now, for Javascript in browsers, that means knowing when to avoid GC among other things. As far as I can tell, and while I'd be happy to be proven wrong, I'm not sure anyone even honestly engaged with it, much less disproved it.

Say you have a game that might spawn and kill hundreds or thousands of enemies and/or projectiles per frame, and those enemies are more than just a pair of coordinates. You simply can't use something like new/delete for that, period. If you haven't even tried, be it in the context of game or something comparable, and dismiss what I say based on some notion of what I "deserve" or what one should criticize about a language (when I was simply talking about what you need to do today, to get a specific thing done in a specific environment), you're really missing out.

The question (to me) isn't what language should you use to make the most accessible kind of simple game, the question is how to get shit done in Javascript. I'd be happy to try something better before it gets adopted, but no, I won't pour a lot of work into a game made with it, I might dabble and if I fall in love I might advertise it, but to lay a foundation I might be still using years later, that does need the adoption having taken place (past tense), not it being the objectively best idea.

I'm not "complaining about GC" either, I do prefer it to memory holes. But show me anyone writing seriously about making games with Javascript claiming you shouldn't pool objects etc., or decent libraries that don't do it for you. I'll happily admit I don't know much, but this one I laboured over and read about and experimented with more than enough (because I just couldn't get rid of the hiccups, no matter how simple I made things) to be rather sure of what I said above. I know it's off-topic, I know people might not care, I know it's no great insight, but that doesn't make it false. It was something that tripped me up and which made a huge difference to me, and it's something I see very rarely discussed outside of people actually making games. That's why I brought it up, hopefully those actually engaging in anything that might benefit from this will look into it, enjoy and you're welcome.

https://hacks.mozilla.org/2013/05/optimizing-your-javascript...

> JavaScript can spoil us when it comes to memory management. We generally don’t need to worry about memory leaks or conservatively allocating memory. But if we’ve allocated too much and garbage collection occurs in the middle of a frame, that can take up valuable time and result in a visible drop in FPS.

And since as of yet there is no way to control when GC occurs, and no way to enforce it not taking longer than X ms, you have to know how to avoid it. It's not that you always have to avoid it, but you do need to know how to, at least if you write a smooth game, a physics library etc., anything that runs for a while in a quick loop and does more than a handful things which should not drop frames when avoidable.


Ah for the good old days when we just tied our animation routines to a vertical blank interrupt and that was that. :-)


Disclaimer: I'm the CEO of jClarity who produces Censum.

For those who are looking to read the arcane output of a Java GC log, you can grab a 7-day free trial of Censum (https://www.jclarity.com/product/censum-free-trial/) - it parses GC logs (Java 6-9 all collectors) and gives you a host of analytics and graphs to help you figure out what's going on. We've also got blog posts on GC (https://www.jclarity.com/blog) and our slideshare http://www.slideshare.net/jclarity


Disclaimer: I don't work for jClarity who produces Censum.

This is a super valuable tool, which I recommend people take a look at should they have the misfortune to need to read a Java GC log.

Great work, jClarity!


Plugging my own follow-up:

    https://news.ycombinator.com/item?id=11555129


A very good post, but like much discussion here it turns into a comparison of Java vs. C++. It would be good to also compare to Go, which is GCed, but gives you all the value types of C++. The value types of Go are probably the reason that they have so good GC performance (version 1.6+) without all the complexity of the hotspot GC.


I think at this point it is clear those who care about memory usage and deterministic performance will use C/C++/Rust(maybe). Saying Java is not up there will bring Java supporters arguing endlessly how Java is so superior to anything else in market, how Java's GC is state of the art. It would not matter to them how much expert level tuning it takes to make it work. Yeah and then there is Azul zing: a heavily over provisioned system on top of already over provisioned Java systems to have better GC compared to Oracle/OpenJDK GC.

Working in Java for 10 years make me realize that so many solutions Java/ JVM ecosystem provides are to the problems that Java ecosystem created in first place.


> Working in Java for 10 years make me realize that so many solutions Java/ JVM ecosystem provides are to the problems that Java ecosystem created in first place.

There are lots of companies making a living selling tools to track down and fix memory corruption issues in C and C++.

Java is not alone.



This article would have been much more informative had we seen the actual program. Usually it doesn't much matter how big a file you read from your program assuming you are reading for aggregation purposes. If you are in need of individual records, then that is what databases are for. So i smell a large red herring. Is GC a problem? sure it can be. Usually tho when it is, it's more likely to be my problem, not the JVMs.


I forgot the exact number, but I remember reading somewhere that for a garbage collector to work well you need to give it something like 3 or 4 times more space than you will actually be using at maximum allocation - not counting transient stuff.

This allows it to shuffle things around more effectively while it is cleaning up, doing things like copying into a compacted area.


Two simple suggestions:

The author mentioned reading in files with x "number of lines". If they are then parsing the lines into some structured format, there are likely many opportunities to look for low cardinality aspects and to reduce object tenuring by pooling strings using either String.intern or a hashset.

They should also consider increasing the eden size.


it's been a really, really, long time since i've read something about computers and been completely and utterly baffled by what i saw.

thanks for making me feel young again.


Do you really need to read all the 8 million lines into memory? Wouldn't it be better to have some kind of streaming and read up to, do some work, read the next??


Why not just assume so? Aren't there valid reasons to keep all in memory?


Garbage collection can cost money to a firm. Working a few years ago in a large investment firm, lots of the code used to trade via algos, make markets, index arb, was all written in C++. Deterministic, performant, reliable - it just worked as planned and expected.

New tech lead comes in, swaggers about, declares all the street now uses Java for their trading code, we should too. I got the desk heads to listen and be wise to the impending issues, and they told him fine - but if the new Java based code had direct impact on PnL, then his budget, ultimately, he would pay for it. Cocksure, he agreed.

Despite throwing bucket loads of Java devs at it, spending fortunes on "tuning" consultants, performance suffered, GC did affect at critical trading conditions, and eventually he was exposed and kicked out.

The C++ code came back out of retirement, was updated for C++11/14, and still serves them well to this day.


Quite a few of the top HFT firms use Java, almost exclusively.

The trick is to segment your critical path code (whatever is executing the actual trades and is highly susceptible to delays) from any business logic. Then you can focus on making critical path components fast - disable GC completely, keep object allocations way down, audit every line of code, etc. With such a setup you can even do better than typical C++ because you avoid virtually all object allocation/deallocation costs that C++ has. Java object allocation is dirt cheap, it just hurts when you GC, which you can basically avoid or schedule for out of hours. If you want better I would not bother with C++ at all personally, but use C / assembly or even look at FPGA type setups.


This sounds like simulating arena allocation in Java by disabling GC temporarily. Except it's worse than arena allocator, because it's still mixed up in a general heap and costs more at collection. Not sure I buy the argument that this is cheaper than C++. After all, in C++ you could just drop the whole arena instead, so you get cheap alloc + super-cheap free.


Collection cost in that model is irrelevant as it's only a split second and happens after hours.

Yes, you can emulate a young generation in C++ or even multiple young gens (i.e. arenas), obviously, as you can do anything in C++ that you can do in Java and vice-versa (java has the Unsafe class that allows you to do low level programming with pointer arithmetic etc). The question is not, can it be done, but rather what's easier and lower cost?

With the Java approach, the only developer overhead is making sure you don't do too much allocation on the hot path, then sizing your young gen to avoid collections (one command line flag), and doing a GC at night. Not nothing but not too hard either; after all, you can allocate if you really need to, and some allocations will be optimised out anyway.

The C++ equivalent would be to allocate as much as possible on the stack, and then make sure to do a restart every night to eliminate any heap fragmentation that is left. Not that different, until you screw up and try to delete something on the stack or forget to delete something that isn't. That's when the robustness the GC is giving you starts to pay off.


> The question is not, can it be done, but rather what's easier and lower cost?

I thought it was about speed.

> The C++ equivalent would be to allocate as much as possible on the stack, and then make sure to do a restart every night to eliminate any heap fragmentation that is left.

The C++ equivalent would be to do heap allocation and never free, then shut the process down which will return its memory to the OS. Heap fragmentation wouldn't matter here since you aren't freeing any memory.

Fundamentally if you can avoid freeing any memory all day then you should just allocate it up front an be done with it. Anything else is ignoring the more fundamental point that you are using a limited amount of memory and pretending that you aren't.


Runtime speed is, as noted, going to be similar, with the winner being determined by which compiler optimisations benefit your code the most.

Yes you could also pre-allocate, but that's often more awkward than just allocating at the time of need, and besides, Java can turn heap allocations into not only stack allocations, but fully scalarise them (i.e. break down an allocation into a set of local variables and then delete any that are unused, move them around, etc).


> Runtime speed is, as noted, going to be similar, with the winner being determined by which compiler optimizations benefit your code the most.

This could not possibly be further from the truth. Getting rid of heap allocations can speed something up by ~7x. Cache locality can speed something up by 50x. Compiler optimizations do very little compared to accessing memory in a predictable way.


Indeed! In Java or dynamic langauges you are forced to always use a reference. Which means you'll get cache misses pretty much everywhere.


> The C++ equivalent would be to allocate as much as possible on the stack, and then make sure to do a restart every night to eliminate any heap fragmentation that is left.

I think that's a very brute-force approach. You could instead use an actual allocator which behaves the way you want.

For example this is how arena can work without any restart/stack magic: https://github.com/cleeus/obstack/blob/master/arena_test.cpp...

Same happens in google's protobufs https://developers.google.com/protocol-buffers/docs/referenc...

There's no need (almost) for manually wiping heaps at system level if your language allows you to change the allocator at any point of the app.


This is very interesting - is there any sample code one can learn from to build something in this style of programming ? Especially the deployment - do you have a scheduled that triggers GC or do you simply kill and restart, etc.


Actually you can trick the JVM to run a Full GC. If you run:

    jmap -histo:live <PID>
it will show a memory map of all the live objects and since java needs to run a full GC cycle to do that it will actually run a Full GC.


> better than typical C++ because you avoid virtually all object allocation/deallocation costs that C++ has

So to have Java perform better than C++ you just have to avoid allocation in Java and leave it in in C++?

> If you want better I would not bother with C++ at all personally, but use C / assembly or even look at FPGA type setups.

There is nothing in C that gives it a speed advantage over C++. Basically you've discovered step 1 of 3 when optimizing software (step 0 is to profile) -

1. Minimize heap allocations 2. Rework memory access for cache locality 3. Use SIMD

1 and 2 can be done with C++. 3 really needs to be done with something like ISPC so that SIMD use isn't fragile and compiler dependent.


I was going to say for people who are going to lengths like kernel bypass networking to get these trades out faster I wouldn't be surprised if the key stuff is in assembly.


I would be. Manually written assembly is not necessary better than what a compiler would generate and with super scalar pipelined architectures, seemingly better assembly can turn out to execute in the same amount of time. The things that matter are algorithmic.


I think you assume compilers are doing more work than they actually are. Processor models in the compiler are generally fairly simplistic and even ICC can easily be beat by hand tuned assembly.

Where compilers shine is allowing broad process-wide optimization.


> Quite a few of the top HFT firms use Java, almost exclusively.

such as whom?


If you're disabling GC, why use Java at all?

That's the point of using Java (or any interpreted language), isn't it? Avoiding manual memory management?

If you're going to go through all that horrid FactoryFactory Factory = new FactoryFactory(); bullshit, might as well get the benefits of the GC.

And if you don't want the GC, why not write your code in something else?


> If you're going to go through all that horrid FactoryFactory Factory = new FactoryFactory();

I get the impression that your only exposure to java is through jokes and blogs from dynamic typing evangelists.


How would that show any connection to dynamic typing? I imagine that the GP is coming from the perspective of C based on the fact that they're criticizing the use of garbage collection.


You'd be incorrect.


There are no popular languages that both have no GC and are comfortable to work in. Rust is going to be one, though I feel it's not entirely comfortable yet.

Garbage collection is just one of the things that make languages like Java comfortable to work in, but certainly not the only thing. Disabling automatic garbage collection in Java or C# I feel is a very effective way of gaining most of their advantages while limiting the effects of the runtime on the variability of your execution time.

Java is not an interpreted language by the way. FactoryFactories are a choice, not necessarily forced by the language.


How is your statement not one by someone who would find the modification of any language considered comfortable to remove garbage collection and require explicit free operations to render it uncomfortable?

Also, Java is interpreted at the byte code level by default. The way the hotspot JIT works is by detecting hot areas of code and then doing JIT to be compiled.


What sense does it make to speak of a language as being interpreted or compiled? These are attributes of an implementation. The existence of GCJ, a native-code Java compiler that's been around for years, proves this.

It's true there are languages that are essentially uncompilable, and so have to be interpreted. Perl is the best example I can think of; there are constructs in the language that can't be parsed correctly without runtime context. Python, Ruby, and JavaScript are all difficult to do AoT compilation on as well, at least if you want to do much optimization, but I wouldn't agree that they can't be compiled. But Java is not as dynamic as these languages.


When talking about Java, you can mean the standard library, the virtual machine or the language. The term Java itself is ambiguous.


I don't think it must be absolutely so that a language without garbage collection is uncomfortable, as I said I feel Rust comes very close.

The only popular languages without a garbage collector are either from the 70s (C) or based on a language from the 70s (C++xx). Both have extreme discomforts when compared to modern languages (Ruby/Haskell/C#/Rust).

Also, technically, that would make Java bytecode an interpreted language, not Java itself ;)

edit: Ok, apparently according to wikipedia I am wrong and Java generally is considered to be an interpreted language, I had a slightly different definition of interpreted in mind.


Yeah, when C displaced Pascal in CS programs (mid to late 80s?), it felt like an uncomfortable step backward even then.

Why were we programming in (little more than) assembler as our main language???


Mid to late 90's in Portugal.

When I came to learn C, I was already quite an expert in Turbo Pascal, version 6.0 by then.

Comparing C to Turbo Pascal 6.0 in terms of features and safety just felt like "meh!".

Thankfully I got someone that around the same time gave me a copy of Turbo C++ 1.0 and I learned how to get some of the Turbo Pascal features back, while having the C portability and joined the C++ ranks for a few years.


Similar experience. I picked up a copy of TP 5.5 and finally had an easier way to build my "ADT"s (abstract data type library type code).

THEN, I got a copy of TC++ a few months or a year later. WTF?!?

The light came on when I read Scott Meyer's "Effective C++" book in 96 or so. "Effective" meant "not stepping on one of 50 or so common land mines". I realized what a bad joke C++ was, and decided never to go back to it.

C is useful as a substitute for assembler, but not most applications. Too bad we didn't get to see more of Smalltalk in school in the 80s. And maybe Scheme instead of just a little bit of Lisp.


C++ first came into existence in 1983. C was made in the 1970s and was the basis for C++, but modern C is rather different than the original C because it assimilated C++'s strong type system. As for popular languages without a garbage collector, here is a list off the top of my head:

* C

* C++

* FORTRAN

* Objective C

* Objective C++

* Pascal

* Swift

Making a new Turing complete language is ultimately just an exercise in how to do the same things differently rather than better unless you find a way to construct a language that can use hardware more effectively than the existing languages can. Languages that lack garbage collection provide no incentive to wipe the slate clean for easier garbage collection.

There is an enormous difference between expert level use of C and languages like it and beginner use. There are static analysis and dynamic analysis tools that have been built to assist with catching bugs such as misuse of memory or undefined behavior. C and several others that I listed above are flexible enough that you can implement design patterns that you would expect to see in more "advanced" languages and with structured programming, you can use them fairly easily once they are written. Functional programming is doable:

https://github.com/cioc/functionalC

Object oriented programming is also doable:

http://ooc-coding.sourceforge.net

Generic programming can also be done using a mix of macros and void pointers. A rather powerful design pattern that I have seen in C is a function that encapsulates iteration on some data and invokes a callback that takes an accumulator that had been passed to it with the callback. It is great for iterating over things that are not expected to always fit in system memory. The non-accumulator version of that pattern is used in the POSIX standard ftw C function. The acculumator version feels much like generic programming as the type of the accumulator is known only to the caller and callback. The iteration function has no clue about the acculumulator's type. The same goes for plenty of in-memory data structures implemented with void pointers like lists and trees where the memory describing each node is encapsulated inside the object.

There is definitely a greater learning curve to C and languages like it, but once you are faililiar with the right patterns/abstractions, such languages are a joy to use and the advantages of more "advanced" languages look more like trade-offs rather than be killer features.

Also, people who are familiar with such languages tend to program differently. At work, I have been asked to write some userspace code in Go. I wanted to make some directory traversal code use as few CPU resources as possible (which is a design goal), so I asked for tips on how to do system calls from Go and horrified at least one Go programmer in #go-nuts on freenode in the process. Using the syscalls directly enabled me to take advantage of SYS_gentdents64's d_type to avoid doing a stat to determine whether an entry is a directory or not, increase the buffer to read more directory entries per syscall fewer calls per directory and detect when the end of directory by checking when the buffer has 65535 bytes of free space remaining, which reduces the verse the invocations from 2 to 1. A programmer that does everything the way that garbage collected language authors recommend likely would have had a far less CPU efficient traversal. The superfluous stat syscalls alone would have increased the number of syscalls by at least 1 order of magnitude.

I wrote a patch to glibc last night that enables readdir() to skip the second getdents64 call on small directories and I plan to submit after I have what I consider to be the final version. That ought to accelerate GNU find. I might give the Go OS package similiar treatment, although doing fast directory tree traversal the way I am doing it (which is similiar to what GNU find does) requires that to go OS package provide type information from getdents. That is a non-portable BSD extension that Linux adopted and consequently is something that I would not expect Go to provide.


Comfortable is a relative term. There are people comfortable working in C or C++.


True, there's a relevant photograph here[1]. Although I think that even those when they fall asleep or otherwise let their guard down their programming experience can get real uncomfortable real quick.

1] https://en.wikipedia.org/wiki/Bed_of_nails#/media/File:RGS_1...


>Java is not an interpreted language by the way.

Java compiles to native machine code?


Most of the time yes. The JIT will generate machine code for almost all usual code paths after little time.


Yes, it all depends which JDK you make use of.

Apparently many seem to think the OpenJDK is the only option available.


That factory code you mentioned isn't _just_ a Java thing, but a very well-known and wonderful design pattern.

Wikipedia says it best: "[The] Factory pattern deals with the instantiation of object without exposing the instantiation logic."

Practical benefits:

- Can help rid of excessive switch and if/elseif statements and splits up your logic

- Eliminates new keyword to improve testability


As with most language jokes/prejudices, I think the hate comes from seeing the misguided, incorrect or unnecessary uses of that pattern.

Java's prominence in the corporate world means sturgeon's law is more apparent.


Never heard of that law. Brilliant.


If you give Java a try, it's got a lot more impact than Factory. Play around with the language a bit, there are things in the JVM and the language that really change the way you think about programming.


Sure. I now know:

* What a great feature function pointers / procedural types / other call-by-name/reference-mechanisms were. (J8 lamdas are used like function references, but are kind of heavy)

* What a source of mystery "annotations" are.

Java started out like an interpreted version of Delphi or UCSD Pascal disguised as a cleaned up C++, but since 1.1 it evolved slowly and/or in the wrong ways.

Actually, as somebody who has been programming since 1983, in quite a few languages, Java brought little to nothing new to the table. Nothing personal against you, I'm just sick of the Java language/syntax and how it failed to live up to the hype.


I'm a fan of Java in some senses, but curious to here what things you discovered in the language or JVM that changed how you thought about programming


For me it is the vastly superior tooling.

* Best IDEs around. (Yes better than .net ecosystem)

* JVM monitoring (Flight Control, VisualVM)

* Performance Sampling/Profiling <1% perf cost

* actually being cross platform (yes, mono, bla)


Ability to use Clojure, Scala, Freje, Groovy, Java, Javascript, Python etc all on the one platform and share libraries and objects between each other.

People forget that Java has the best ecosystem of libraries bar none especially when you get into the more enterprise areas e.g. Big Data.


One-and-done is my favorite way of thinking about using Java. The JVM provides the perfect, for my use case, amount of abstraction from the hardware while still allowing you to do some amazing things.

There is also a huge amount of features that are supported by the JVM that Java still doesn't make use of.

There is also a great JIT and garbage collector.


Yes, but I suppose this is more the JVM than the language itself. The JVM is quite convincing though, hence Clojure, Scala, etc.


Because Java is the new COBOL -- especially if you statically allocate everything for performance reasons. At that point, you really are left with something not unlike COBOL with separate compilation and the ability to allocate scratch counters (or other primitives) on the stack locally, but all of your main data is still copy books^H^H^H^H^H^H^H beans allocated in the DATA DIVISION.


My experience with trading systems led largely to the same conclusions.

You CAN write GC code that does very little collection, but you end up being allocation sensitive anyway, so you may as well write c++ code.

For instance I wrote some .NET where instead of logging with strings, I'd use StringBuilder and try to get some reuse. Problem is SB wasn't written for that, and will allocate again if your new string gets too big. In the end it ended up being a pool of stringbuilders of various lengths. C++ would not have had this issue.

The problem with GC tends to be something alluded to in the article. When trading is brisk, a lot of data is coming in. So memory pressure is higher, leading to more GC. That's exactly when you don't want it. And debugging managed code I always find is more complicated than debugging C++. You need a whole suite of memory debuggers, all with their own idioms, and then it only really gives you a hint. With C++, you can valgrind et al where you can override new() and put in some accounting code. And it's normally quite clear where you allocated.

You can also just tell the OS you'll deal with memory yourself. Just get a big piece, and keep track of what's what using various arenas and such constructs. That way you're not even exposed to the variance in the OS allocation time (time to find space for a new object).


I'm not saying GC is a great idea but c++ doesn't necessarily solve your allocation problem: https://groups.google.com/a/chromium.org/forum/m/#!msg/chrom...

It seems like high performance systems should just be coded like microcontrollers, no OS or minimal embedded real time OS and you manage all your memory yourself.


Yes that's right. Essentially you need complete control over what happens, without having to pass control back to the OS.

Of course just using c++ is not a solution. There's a bunch of guidelines you need to stick to, especially with string.


What I find sad is that Java garbage collection isn't better than manual memory management, given the vast investment in the jvm. There are plenty of ways in which garbage collection can be faster, but production software is still in the stone ages when compared to research from 10 years ago.

For example, any scenario where Rust doesn't even need a lifetime annotation is a scenario where a garbage collecting language compiler still could manually manage collection at almost zero cost. It could, very simply I might add, make inferences of the most efficient way to collect block scoped objects...the vast majority of which end up getting collected in the short lifetime pool. Long lifetime objects might be more efficiently reference counted, so why can't the compiler decide that singletons should be reference counted, or even statically allocated instead of pinged during every large collection? And why can't we have an actual disposable interface that actually performs a manual collection?

For a language that is as memory heavy as Java is, with as much boxing that it does, and the high performance demands of its user base, it blows my mind that their best idea for improvement in garbage collection in the last decade was to naively go after more concurrency. I realize that is what everybody thought the future was for everything, but something something Amadahl's law. There are so many better opportunities that involve allocating less, destroying without a mark stage, etc.


What you're describing is escape analysis and the resulting SROA. These are things that the HotSpot VM does in fact do.


That is news to me (thanks for the clarification), but unless I'm misunderstanding you, escape analysis is only for block scoped allocations. It doesn't perform any optimizations for large and/or long lived objects so they can bypass expensive large heap mark stages, and it doesn't allow for deterministic destructor interfaces. As far as I can tell, the only industrial GC language out there attempting to do this sort of thing is Swift.


Deterministic destruction RAII style is handled in GCd languages with try-with-resources or using type constructs.

Modern collectors like G1 don't do full mark/sweeps of the entire heap in normal operation - only in rare cases like if you request a memory profile, or if the entire process runs out of RAM and needs to do a hard stop. And reference counting isn't something you can just automatically do due to the presence of cycles.

Notwithstanding these things, you may be interested in Graal, and specifically this talk on automatic region allocation:

https://www.youtube.com/watch?v=3JphI1Z0MTk&list=PLX8CzqL3Ar...

http://www.oracle.com/technetwork/java/jvmls2015-wimmer-2637...


A program that segfaults at the wrong moment can ruin your day. It is a tradeoff. It boils down that bad memory handling produces bad results. In C++ these are memory leaks or dereferencing invalid pointers, in GC languages, you pay with performance impacts. And, very occasionally one technology is completely unsuited for a given task. But that should not lead to such blanket statements.


Well Java can crash and slow down due to GC pauses. But at least you get a nice stack trace.


You could call Java a very versatile language :)


This sounds like a hohum IT department.

Finance is software and has been for a while. Competent players in this field wouldn't base entire implementation strategy based on one expert view. More likely, they would have more substantial trade off discussions.

I would this shop had much bigger problems than choice of stack.


There's a difference between using a tool that's not up to the job and employing people who aren't up to the job. Java is just fine for this kind of application.


Did he try Azul's Zing or their previous Vega hardware solution for pauseless JVM GC?

Not free by any means in $$$ or performance, but they should have avoided any pauses due to GC.


Do you actually mean that I need an expensive, dedicated hardware accelerator to get Java's performance to match C++?

Why not do as they actually did: use C++ ? Sounds simple enough... and much cheaper.


No, in many cases the JVM performance will match and sometimes even outperform a C++ implementation in terms of performance and I've seen a few examples of that, although math-heavy operations is usually not one of them. In terms of size and memory usage, a C++ implementation will almost always win.

The problem with Java is that it's not as deterministic in terms of performance as C++, so in areas where that is a must, it has to be taken into account. Even after saying that, the GC can, and indeed should, be tweaked and optimized. The JVM gives you more tweaking options and tools to optimize the garbage collection than any other platform I've worked with, but most developers don't know about them or don't care.

In cases like the original comment mentioned, it may have been better to continue using the C++ implementation, or it might have been useful to spend a bit less in consultants and developers and a bit on Azul's Zing (and maybe a few people that understood the platform). It might have saved a lot in the long term and indeed there are plenty of algorithmic trading solutions written in Java.


A few things to consider:

* it tends to be harder to write good C++ code than Java. For example we think that an average person can write better java code than C++ code because java is simpler to get going with. We also think that the average developer makes fewer mistakes in java than C++ (because C++ allows you to do 'crazy things!')

* that means that you need (on average) a better developer for C++, and that tends to cost more

* java tooling is exceptional (probably the best there is?)

* making small changes and releasing is much quicker in Java; builds tend to be shorter in duration for example

* once you have the framework for your system, the differentiator is then the business benefits delivered. If you can push small changes out very quickly you're getting rewards quicker and you can experiment more

* once you get down to certain performance points, it becomes similar writing java code as writing C++ code, but the JVM is also doing various work that you can't control so much. That can then inhibit you.

Other things:

* Azul isn't just h/w development it's also now in kernel plugin form iirc

* Have a look at Aeron (https://github.com/real-logic/Aeron) and I think the C++ and Java versions are fairly similar performance and they're designed by super with decent (industry renowned) developers writing it

* A lot of people write high performance code making extensive use of templates. That's a totally different way of programming than usual C++


Boring C++ is often faster than boring Java code. It's safer to write hacky Java, but code reviews should not let hacky code in.

C++ costs more, but that's often a non issue for a trading floor. Security is much simpler in the Java world, but also possible in boring C++ code.

Making small changes is really more a question of your code than anything else IMO. There is a lot of crazy C++ and Java code out there.

Now for basic CRUD apps Java is a clear win. For high preformance trading that's an open issue. But, IMO there are much better languages to use.


What about a third way? I have not used the GNU version of Ada or Pascal, but you get C/++ style allocation choices (static, stack, heap) without all of the safeties turned off.

Having bounds checks (with line number error reporting, rather than secondary, tertiary... damage followed by a core dump), and the use of non-null references (as well as pointers for the initial allocation from the heap) goes a long way towards eliminating much of the time wasting bullshit that C/++ brings into your life.

C is good for portable assembler. Otherwise, I want something less wacky than C++ to be "effective" with at a medium low level, and something higher level than Java for most other apps.


Where are you going to get a team of Ada/Pascal developers from ?

And tools to support them ?


And Enterprise Management has spoken!

Ignoring the fantasy that I would get to use something higher level (than Java) for most work...

Not everybody panics if they have to write code outside of Eclipse or Visual Studio.

Once upon a time, there was an expectation that programmers knew, or could learn, more than one language.

Sorry about the rude response, but I'm really sick of the dumbing down of everything for (counterproductive) business or "risk management" purposes. To paraphrase, "average performance" in this industry is pretty poor.


Azul isn't THAT expensive. These days it's just x86 software anyway. Financial trading firms are actually one of their top customers, if I understand correctly, and Oracle is developing features for low latency work partly driven by the demands of algorithmic trading shops.


OK, but it's more expensive that "free", and they didn't previously list their prices, but now I see at the excellent https://www.azul.com/products/zing/zinqfaq/

Zing is priced on a subscription basis per server.... The annualized subscription price for Zing per physical server ranges from $3500 (for several hundred servers) to $8000. Higher volumes and longer subscription terms will reduce the per-server price for Zing. Pricing for virtual servers is also available upon request.

Since you're likely to be running it on a server with a hardware cost approximately the same or larger, next time I'll only use two dollar signs ($$) ^_^.

And I've indeed heard it's popular with financial trading firms for the obvious reasons. Up to 1 TiB heap, no problem, and no 1,024 second pause which other collectors require for a full GC.


C++ is just free, because you happen to have clang and gcc to choose from among a pool of otherwise commercial compilers.

I remember when all C++ compilers worth using were commercial.


I had to solve stop-the-world pauses that were causing occasional intolerable lag spikes in a multiplayer game. The solution was to switch to Java's G1 GC, and also to deadpool and reuse every object and byte array I possibly could.


I wonder if Go's GC would exhibit pathological behavior in this case.


Go's current garbage collector isn't generational. And it aims to complete its stop-the-world phase in < 10ms, so even the young generation collection which the author describes as "fast" would be unacceptable (90ms).


Tldr: if you allocate nearly all heap memory, GC performance is bad. Who would have thought. What an insightful post.


Please don't post snarky dismissals to Hacker News. It degrades the community.

If you have a substantive criticism to make, please make it neutrally, so those of us who don't know what you know can learn something. Putting others' work down distracts attention from the subject matter, makes you sound like a jerk, and makes HN a bad place to be.


The article may not be news for you, but it doesn't pretend to be a comprehensive guide: it introduces common issues and explains their relation through an experience report. We do need to talk more about garbage collection. Many developers do not understand very well how GC works, and there's a lack of high-quality discussion online. The article and the discussion it's sparking helps demystify the topic.


I agree. It has a very click-baity title though. From my experience GC is usually very fast, except when the program runs out of memory or over allocates all the time due to bad programming.


The point of the article was to show how the % of used memory will affect performance in a non-linear manner. It's obvious once you know, but it's not if you've never experienced it before.


So yeah, parallel old has high latencies. In other news water is wet and the sky is blue. If you care about GC latencies don't use parallel old.


In other news, the sky is blue.


Comments like this break the HN guidelines. Please post civilly and substantively, or not at all.


Can you make it part of the guidelines to not use ego? Without ego, comments contain substance and are civil.


That might be setting the bar a little high.


Containers will slow down your app




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: