Go’s march to low-latency GC

_ph_ · on July 6, 2016

Another very nice feature of Go is, that since 1.5, the whole runtime, including the GC is written in Go itself. So every Go developer can look at the implementation. The GC itself is surprisingly small amount of code.

chrisseaton · on July 6, 2016

I never understand this argument. Are there people who know enough about GC to understand potential issues and have an informed opinion on how to improve it, but can't read C?

_ph_ · on July 6, 2016

"read C" is a very loose term. Yes, I can read C. But, as I am not using in daily practice, I am rusty. While as a Go programmer, I am quite trained in reading Go code. You also do need to know the behavior of one compiler (the Go one) vs. also have a good understanding what the C compiler does.

Also, Go is a much clearer and strongly typed language, so Go certainly is a much nicer implementation language than C. (If I thought C was better suited, I would be using C in the first place...)

chrisseaton · on July 6, 2016

Right, but are you also skilled in writing garbage collectors? (You may be, I'm not assuming you aren't. I'm not.) Otherwise the fact that you can read Go more easily than C doesn't make a difference, as what would you do after you've read the code if you don't understand what any of the algorithms or why they are designed that way?

I'm sure it's very good that as much of the runtime is written in Go as possible, but I think people are being too optimistic when they hope it will empower people who aren't already skilled in compilers or garbage collection to contribute.

_ph_ · on July 6, 2016

Of course, being a Go expert does not make you a GC expert automatically. You need to be both. But why should you also be a C expert? Adding another whole field of expertise to the requirements does not sound like an improvement.

And everything written in Go also means you are dealing with just one compiler and not two, if you mix Go and C code.

chrisseaton · on July 6, 2016

I don't think you should also have to be a C expert, but I am suggesting that, in practice, there is nobody in the world who is proficient in garbage collection who does not also know C. I think if you learned everything needed to understand GC, but were never exposed to C, you would already know enough to pick the language up in a couple of hours.

Nobody is going to turn up in the Go IRC room saying that they have a great idea how to reduce pause times by improving the work-stealing between concurrent markers by using a better lock-free queue algorithm, except do'h they don't know C.

I get your point about just one compiler though - less moving parts is good.

geodel · on July 6, 2016

One benefit you miss is by using Go in GC all Go tools (fmt, profiler, vet, godoc lints etc) available to writers of GC also which was not possible with C.

I think it is similar to Oracle was trying to JVM in Java known as Maxine. Now JVM contributors or potential contributors would know C++ but from OpenJDK website one motivation was to leverage amazing Java tooling to write its own VM.

I just noticed Oracle seems to have removed references to Maxine VM from Oracle website and OpenJDK website. Seems that project is no longer active.

chrisseaton · on July 6, 2016

Try searching for Graal instead.

https://github.com/graalvm/graal-core

geodel · on July 6, 2016

Thanks.

From the link:

> Graal is a dynamic compiler written in Java that integrates with the HotSpot JVM

I am not sure if this is Maxine VM which I thought something analogous to Hotspot JVM. Or may be Maxine was similar in scope as the link you have given and not an experimental or otherwise replacement of Hotspot JVM

chrisseaton · on July 6, 2016

It's the JIT compiler written in Java that was in Maxine, updated, integrated into Hotspot so you don't need to use an experimental VM to use it.

Maxine is still alive though - but I think the only people maintaining it are academics.

geodel · on July 6, 2016

Ok, makes sense. Maxine might be now independent of Oracle as links from Oracle website goes nowhere.

sievebrain · on July 6, 2016

It isn't Maxine. I believe Maxine lives on in the Oracle-only "Substrate" project. But there is also Jikes which is a similar jvm-in-java project.

TickleSteve · on July 6, 2016

equivalents to all those tools (and more) are available in the C ecosystem.... just saying....

BTW: C is a much more mature ecosystem than that of Go.

PopsiclePete · on July 7, 2016

C has an eco-system? Really? Yeah, sure I can download source code and headers and somehow use 12 different antiquated tools that strung together with duck tape and bubble gum that are also not standard on Windows let's say, to actually hopefully compile that code, and then worry about 12 other different tools each with 6 different options to be able to do something simple like "link a library", and hope that works on FreeBSD and OS X and Windows, but it won't, it never does, without spending an insane amount of time tweaking headers and m4 macros, before giving up and learning CMake, but if that's what you consider a "mature eco-system", then you have very low standards.

_ph_ · on July 6, 2016

No one doubts that there is a rich ecosystem available for C. The thing just is, when you are working in a Go environment, what benefit would adding C with its own toolchain bring?

coldtea · on July 7, 2016

No need for bootstrapping step, 90% of portability taken care off for free, ability to use lower level and faster primitives than Go offers, access to tons of great C libraries.

koffiezet · on July 8, 2016

> 90% of portability taken care off for free

You sir, have clearly never attempted to write portable C or Go code. Writing portable C code takes a serious effort. It's not hard if you know what you have to pay attention to - but 90% portability taken care of for free? That's simply not true, unless you think being portable is "it runs on a POSIX system".

Writing portable Go code - in most cases - you don't need to do anything or make only slight changes to your code, and cross-compiling is the easiest I've ever encountered.

ngrilly · on July 7, 2016

> ability to use lower level and faster primitives than Go offers

Like what? Go's unsafe code is free from doing almost anything (except things like array bound checks that are always enforced).

drivebyops · on July 7, 2016

Not the build system, a pain to get the libraries in Windows for a Multiplatform project that I wanna compile.

Where in Go this is very easy

cbsmith · on July 7, 2016

There have been similar Smalltalk-in-Smalltalk, Lisp-in-Lisp, and Java-in-Java, etc. projects.

I have yet to see one that lead to better garbage collection.

heavenlyhash · on July 10, 2016

> Nobody is going to [...] have a great idea how to reduce pause times by improving the work-stealing between concurrent markers by using a better lock-free queue algorithm, except do'h they don't know C.

I think you do yourself a disservice to discount that.

I'll happily tell you about places in the go runtime where we could use some smarter memory fencing instructions to build faster lock-free queues on x86_64.

I also don't write C. (Well, I'm trying to write a patch to libgit2 right now, but really, the operative word there is "trying": it's just highlighting it all the more clearly: I don't know C.)

I learned about the memory fence instructions while doing concurrent programming in java. The notion that C is the only bridge we can cross -- or the right bridge to cross -- to get to assembly (or any other abstraction layers necessary for high performance engineering) is absurd. We can put it to rest now.

coldtea · on July 7, 2016

>Of course, being a Go expert does not make you a GC expert automatically. You need to be both. But why should you also be a C expert?

You shouldn't, but historically and statistically GC experts and compiler experts are also C experts. And it's not like "Go is written in Go" that gonna change that (we've had languages written in themselves for half a century and still most compilers are written in C/C++).

_RPM · on July 6, 2016

I'd argue that Garbage Collection and the Go language are mutually exclusive.

EpicEng · on July 6, 2016

I'd argue that the term ' mutually exclusive' means that one thing cannot coexist in the presence of another. So... no.

_ph_ · on July 6, 2016

What do you mean by that? Obviously Go has a GC.

PopsiclePete · on July 7, 2016

I can read both Go and C, but not in the style that is preferred by core Go runtime developers. I struggle a lot to read the code when all important data structures are one-letter names - G, P, M, etc. I understand not wanting Obj-C style identifiers, but single-letter ones?

Go runtime code reads like `x := g.b(a.C)` and you have to do quite a bit of manual cross-referencing of variables and identifiers to even get a vague idea of what is going on. It obviously somehow works for them.

startling · on July 6, 2016

You don't need to be an expert in garbage collectors to get value from reading how your particular garbage collector works.

cbsmith · on July 7, 2016

You don't need to be, but if you can't read C, I'm not sure how much extra value you are going to get from reading how a particular garbage collector works.

EpicEng · on July 6, 2016

You may use Go all day every day, but that doesn't mean you have the skill to implement or extend a GC. If you have experience in that area you almost certainly know C like the back of your hand.

BraveNewCurency · on July 6, 2016

Let's pretend this is true today. What about the future? Is there any benefit to restricting the pool of people working on the GC?

EpicEng · on July 7, 2016

Are you really restricting the pool though? Most devs can't write or efficiently develop a GC. Learning a language is easy compared to writing a reasonably performant and correct GC. Anyone in this area worth their salt can pick up C if they don't know it already and, again, they probably already do if they have experience at this level.

I've been hearing 'C is dead' for fifteen years now, but it hasn't gone anywhere.

coldtea · on July 7, 2016

We've had compilers being written in their own language for half a century, but it hasn't changed the fast that most (and all the succesful ones with millions of users) are written in C/C++.

cakeface · on July 6, 2016

I've frequently wanted to understand the actual workings of the JVM garbage collector but I get bogged down reading the C/C++. If you want to know exactly how a program works then reading the source is a great way to do it. Someone who is not a GC expert could become at least GC proficient by reading the code that does say the object tree walking. I know that having the language implementation source available has been useful to me in Java. I go to the implementation of BigDecimal or ArrayList to really see how things work. The same can be true of compilers and runtimes. It's much easier if you are not context switching or required to know different languages.

pjmlp · on July 6, 2016

Just check Maxime or JikesRVM, two examples of meta-circular JVMs.

maaarghk · on July 6, 2016

It doesn't seem like an argument to me. It just seems like an interesting fact about Go. I don't think the post you are responding to is trying to say anything about GC so I've no idea what your comment is getting at. I'm yet to use Go but intuitively it seems like a good thing that the runtime is written in Go. It presumably reduces context switches if debugging an issue that seems to come from the runtime, regardless of if that is a GC related issue.

e3b0c · on July 7, 2016

One thing that is very helpful is that you can navigate the code in your editor using the exceptionally effective code navigation tool for Go. With the tool one can very easily and precisely jump to the definitions and references without building an index beforehand.

colordrops · on July 6, 2016

Excuse my obvious ignorance, but how is the GC written in Go? The GC in Go is not optional, right? Does the GC use GC? Turtles all the way down?

4ad · on July 6, 2016

In general, memory allocated by the Go runtime is garbage collected, yes. When the garbage collector runs, there isn't really any difference between user and runtime goroutines anymore. But the garbage collector itself doesn't created garbage implicitly.

Go is a language that does not expose stack vs. heap allocation as a primitive to the programmer. Memory is allocated in the best place possible, preferably on the stack, but if that is not possible it is allocated on the heap. But in the runtime we need to control the generation of garbage, so runtime code is compiled with a switch that forbids implicit heap allocations (code won't compile if it requires transparent heap allocations). Heap memory is allocated by calling a function like runtime.mallocgc. However, this memory is garbage collected just like everything else (e.g. there is no free).

uluyol · on July 6, 2016

How does one implement malloc in C? (Edit: ignore this. As pointed out in the responses, this actually different since GC calls are inserted by the compiler)

Unsafe code, direct syscalls, using only a subset of language features, and coupling to the compiler (both for generating data used by the GC as well as inserting calls into the runtime in appropriate places).

None of this would really be any different if implemented in C. C is clearly an unsafe language, the syscalls would still be there, as would the coupling to the compiler. The difference is that you have to have a fast way to call from Go into C. With Go this is unnecessary.

Of course if you're really curious, you can always check out the source :)

haberman · on July 6, 2016

There is a pretty big difference between these two cases.

In C, calls to malloc() are explicit. You implement malloc() in C by not calling malloc().

In GC'd languages, the language runtime calls the garbage collector implicitly. So you need some more clever way of ensuring that these implicit calls do not occur. You also need to ensure that no garbage is created that will leak without a GC.

_pfxa · on July 6, 2016

> How does one implement malloc in C?

It's unrelated this one. C the language does not depend on malloc being present, whereas the GC is part of the language in Go.

The important thing for the grandparent comment is that Go is not interpreted, but compiled. Thus when the Go compiler (i) is being compiled by another Go compiler (ii), the (i)'s own GC code is not utilised, but the already-compiled (ii)'s one is used. After that it's all machine code.

mark-r · on July 6, 2016

One of the signs of maturity in a language is when it reaches the point that it can implement itself. C wasn't written in C originally either.

andrewvc · on July 6, 2016

I can't speak for go specifically but in some languages you can use a manually allocated subset of the language to write the gc.

smegel · on July 6, 2016

> Does the GC use GC?

Think of the Go runtime as like the kernel of an operating system. It doesn't have to follow the rules of a user-land process.

knorker · on July 6, 2016

It's not yet idiomatic or good Go code now, though, is it?

IIRC the initial state (1.5) was mostly machine translated code from C to Go.

4ad · on July 6, 2016

The runtime is written by people, not machines.

The compilers and low-level toolchain libraries that encode machine instructions have been machine-translated.

The runtime is an example of very good low-level Go code that breaks safety rules. However, it is not a good example of user-level Go code. Note that even though the Go code in the runtime heavily uses unsafe, the runtime is much safer and more stable (we find less bugs) then when it was written in C.

_ph_ · on July 6, 2016

The main thing is, that they machine translated the Go compiler itself from C to Go, so the compiler had not so great Go code. The rest of the runtime probably was already human-written in Go before that. And of course, anything which has been under development since then is human written, like the new SSA-based compiler.

EdiX · on July 6, 2016

    The rest of the runtime probably was already human-written in Go

find -name '*.X' | xargs wc -l in src/runtime says

Go 1.4:

    21701 .c  33348 .go 19160 .s

Go 1.7b2:

    3340 .c 77597 .go 33827 .s

The assembly growth seems to be attributable to the increased number of supported platforms

4ad · on July 6, 2016

You should exclude runtime/cgo, since it's not relevant.

giovannibajo1 · on July 6, 2016

There's also an increased number of optimized code-paths, like for instance some crypto code in the TLS stacks. The natural way to optimize Go code, for instance to benefit from advanced CPU instructions in modern architectures, is to fallback to assembly.

vorg · on July 6, 2016

> since 1.5, the whole runtime is written in Go itself

The parser was written in Yacc (with C code generated) until version 1.6. I'm wondering if there's any other parts of Go yet to be converted to Go.

r1ch · on July 6, 2016

I have to wonder - when you're digging down into this level of complexity in order to discover issues with the language you're using, wouldn't something like C be better? IRC isn't a very hard protocol and you know the language won't be getting in your way if you're using C.

zamalek · on July 6, 2016

> wouldn't something like C be better?

Jim is an intermediate-level coder, by and far the average guy that you are going to get. He writes his IRC server in C. It performs acceptably and can be scaled horizontally. There are a few threading bugs and exploits (buffer overflows etc.).

Sally is an advanced coder, it took a year of recruitment to find her. She also writes her server in C. It's blazingly fast. Virtually nobody else understands how it works. She's a human, so it's still littered with the same types of bugs that Jim's server has.

Jack is at the same level as Jim. He starts off in Go 1.4. While his server is nowhere as fast as Sally's, its much faster than Jim's. Race conditions and exploits tend toward zero. Everyone on the team can approach the code and maintain it.

Go 1.6 is installed on prod and suddenly Jack's server is now negligibly slower than Sally's. Jim notices this and has to spend a few weeks on his to catch it up. Sally is stuck debugging a race condition that occurs once a month. Jack is adding more emoticons, more features and decommissioning servers in the cluster.

Edit: IRC is a simple problem and that begets a simple solution. While C may be significantly simpler than C++, Go requires far less cognitive effort: it is actually simpler than C.

nitwit005 · on July 6, 2016

"Simpler" obviously didn't happen here. They had to debug the garbage collector, which included looking at traces that went into the OS. They effectively debugged a Go and C mix.

I'm afraid the truth of these things is that if you try to squeeze maximum performance out of some of these more sophisticated languages, you have to be able to understand and debug the runtimes that come with them. Not that many people are up to that, which means the hurdle can be higher.

zamalek · on July 7, 2016

I'm not implying that C is objectively inferior. Quite the contrary: there are tools that suite a specific problem. Maybe Twitch did spend money tuning Go, maybe that was a waste. Point is: they go a free performance boost thanks to a language the prioritizes what you want to do instead of how you do it.

I'm a big believer in early performance optimization, premature if you will. Though, ultimately, what I can crack out in a day with C# is worth months of C. Iterate, even with languages. If, after profiling, you find that your expressive (as opposed to explicit) language is wanting it's time to iterate into a different language. Get something competent out of the door, spend more money on it only if you have to. Right now our socket library consumes less than 1% of the CPU (GC and all), until everything else catches up there is no benefit to getting any closer to the metal.

DanWaterworth · on July 6, 2016

I find it hard to believe Sally is so advanced if she writes code that others have difficultly reading.

It shouldn't take a year to find someone who writes unmaintainable soup.

zamalek · on July 6, 2016

If you had me on an IRC server that had to service the numbers that Twitch is talking about; I would use every single server coding practice I am aware of. Someone else comes along. Maybe his first refactor would be to remove the buffer pooling. Maybe he'd change a data structure specifically designed to prevent false sharing, cutting throughput.

Good code doesn't always mean approachable code, writing a decent socket server in C assumes a ton of advanced knowledge.

If Sally had to write, say, a C logging library it would be a masterpiece of simplicity. These days your code has an audience, and those audiences can vary quite greatly.

DanWaterworth · on July 6, 2016

I consider myself an "advanced coder", whatever that means and so I know that an IRC server is likely to be IO bound. I also know to avoid C, especially multithreaded C. Not because I would be unable to write correct multithreaded C, but because I probably don't have to to solve the problem adequately.

My experience means I know to pick my battles.

zamalek · on July 7, 2016

Disclaimer: despite all my praises for Go, I don't actually use it for real problems. Would I like to use Go? Definitely! Is there a problem that I'm trying to solve that needs Go? Certainly not. In many cases I'd objectively pick Rust over Go, but when it comes to threading primitives, Go is somewhat unmatched.

> pick my battles

Exactly my point. A good coder will choose the tool that expresses the solution correctly. C is a very good choice, it always will be, there are sometimes better choices. Ultimately it seems as though we are in agreement; cheers!

DanWaterworth · on July 8, 2016

Go's threading primitives are not unmatched. You can do the same in Rust [1], but the gold standard in my eyes is STM in Haskell which allows you to block on arbitrary conditions.

[1] https://doc.rust-lang.org/std/macro.select!.html#examples

rantanplan · on July 6, 2016

OT: why is an IRC server I/O bound? That really piqued my interest.

Terribledactyl · on July 6, 2016

I've never made an IRC server but I share a similar feel for I/O (Network) being the limiting factor. Have one user write a message and then you need to send out potentially hundreds of thousands of messages. (At twitch scale)

The actual CPU computation going on per event is minimal (process maybe a few kb of text), and if we're only dealing with text, probably not memory (capacity or throughput) bound and certainly not disk bound.

kctess5 · on July 6, 2016

Correct, fast, and maintainable C code is a very tall order. Good luck finding ANYONE who can do that consistently for a reasonable paycheck.

vvanders · on July 6, 2016

Huh, it's almost as if the foundation of all the devices we use on a daily basis are written in some language that apparently isn't possible to write correctly.

pshc · on July 6, 2016

They are mostly reliable (though probably still full of security holes) despite C—made possible by the millions of person-hours thrown at the problem of maintaining large C codebases.

Back then, we had no choice. We can do better.

DanWaterworth · on July 6, 2016

Essentially all but the most used code-paths with be bug riddled. This paper [1] on finding bugs in C compilers by fuzzing is an interesting case-study.

[1] http://lambda-the-ultimate.org/node/4241

astrange · on July 6, 2016

Server programming isn't even that hard. You could do it in C all day. (Just have good sandboxing for when someone finds an exploit...)

Millions of people have been writing C for games, firmware, computer systems without Internet since the 80s on, and they didn't even get software updates. If your server crashes you get a coredump and you can update anytime you want.

ssalazar · on July 6, 2016

s/isn't possible to write correctly/is difficult to write correctly cheaply in non-trivial applications/

and you pretty much have it. Building higher-level abstractions has historically been a good thing, unless your day job is punching in opcodes.

vvanders · on July 6, 2016

I was more poking fun at the hyperbole that it's not possible to hire anyone who writes C at a reasonable rate.

DanWaterworth · on July 6, 2016

That's basically true.

DanWaterworth · on July 6, 2016

Apparently it didn't have to be C or blazing fast.

star-trek-fleet · on July 6, 2016

I think this is why 'advanced' is used of 'better'. :)

DanWaterworth · on July 6, 2016

I think that's true, but I don't like that it's true. What does that say about programming as a profession? Yuck.

topspin · on July 6, 2016

Reading this causes me to experience déjà vu; years and years of reading stories and watching presentations about someone struggling with GC in the JVM. It's happening all over again with Go. The same 'discoveries', the same trade-offs, the same discussions about hardware resources, the same 'concurrent mark and sweep', the same 'more to do' conclusions. You could replace every occurrence of 'Go' with 'Java' and it would probably go undiscovered.

Maybe it's all worth it and this is how developers are supposed to spend their time, but it's no longer interesting to me.

arcticbull · on July 6, 2016

It's because GC is a bad idea that's had 30 years of good research thrown after it. Advancing GC is building a faster horse instead of stepping back and building a car. It's time to move on, and I'm thrilled to see modern languages (Swift, Rust) abandoning it and focusing on building more intelligent compilers.

The goal shouldn't ever be to make the world's best GC, it should be to create the world's best way to elide lifetimes so that developers don't have to think about memory management. GC shouldn't be a goal, it's a technique for solving a problem, one of many that we should explore.

aserafini · on July 6, 2016

Seems like the opposite surely? We should be developing languages that more succinctly address the problems we humans are trying to solve not book keeping for the computer's hardware (that should be the computers job!).

weberc2 · on July 6, 2016

I think you missed the bit where he said "more intelligent compiler". The compiler is the bit that does the bookkeeping, only in Rust (and evidently Swift--I haven't played with it much) it's done statically, ahead of time rather than at runtime.

That said, I think Go is a much more practical language than Rust for most problems. That said, I'm still very excited about Rust.

coldtea · on July 7, 2016

>I think you missed the bit where he said "more intelligent compiler"

Also known as "sufficiently smart compiler": http://c2.com/cgi/wiki?SufficientlySmartCompiler

weberc2 · on July 7, 2016

These are different things. A sufficiently smart compiler is a hypothetical compiler that could theoretically optimize a high level language so that it could be faster than some low level language. This isn't what we're talking about here--we're talking about the concrete ability of the Rust compiler to preclude certain classes of errors.

arcticbull · on July 6, 2016

Yeah, that's exactly what I was trying to say. Rust does all the bookkeeping at compile time, Swift keeps a lot of it at run-time although the compiler can easily optimize away a lot of lifecycle stuff too when it's in scope, so I assume it either does or will.

I agree that Rust likely does not have the be-all answer to automatic memory management, though what I love about it is that they're pushing the boundaries and getting people thinking differently about memory management.

weberc2 · on July 7, 2016

> I agree that Rust likely does not have the be-all answer to automatic memory management, though what I love about it is that they're pushing the boundaries and getting people thinking differently about memory management.

Me too. I intend to use it for more of my hobby things, but Go is currently the best fit. Eventually I imagine Rust will pick up some decent GUI libraries or at least get decent editor support (vim-go is lightyears ahead of YCM+racer) and I'll be able to afford to justify using it more.

gnuvince · on July 6, 2016

What he said is that the goal should be that developers need not handle memory manually. GC is one technique to achieve that goal, and the one that has been the most successful so far, however we should not equate automatic memory management and garbage collection: other techniques could offer an as good or even better experience if we took the time to explore, develop, and refine them.

justinhj · on July 7, 2016

GC is also required for persistent data structures which makes it a must have for languages where immutable data is a fundamental strategy for handling concurrency.

strictfp · on July 6, 2016

Thank you! Finally someone who talks sense.

pcwalton · on July 6, 2016

> Reading this causes me to experience déjà vu; years and years of reading stories and watching presentations about someone struggling with GC in the JVM. It's happening all over again with Go.

It's because GC is an area full of tradeoffs, and despite popular belief, the HotSpot GC is really good. In fact, I honestly don't know of any way to improve on the HotSpot GC for general-purpose use (i.e. throughput/latency balancing). HotSpot has a generational, concurrent, compacting GC; allocation takes 4 or 5 instructions (really!); the compiler has SROA to aggressively optimize out allocations where unneeded.

SEJeff · on July 6, 2016

Look up the Azul Systems pgc. That is more or less the holy grail, but it is patented out the wazoo, and is a commercial implementation only.

Also, the jrockit jvm (which was from BEA and was purchased by Oracle) is actually quite a bit faster than hotspot and easier to introspect (lookup jrockit mission control) than hotspot. I suspect eventually they'll merge however.

pcwalton · on July 6, 2016

> Look up the Azul Systems pgc. That is more or less the holy grail, but it is patented out the wazoo, and is a commercial implementation only.

That's what I was alluding to in the parenthetical. According to the paper, C4 trades off a significant amount of throughput for reduced latency. That's what you want for many applications, and C4 is a great advance for those apps, but throughput is very important for most workloads, so HotSpot's GC ends up yielding a good balance.

SEJeff · on July 6, 2016

Touche, I clearly didn't fully understand your original statement. You're entirely correct in this case.

snaky · on July 6, 2016

So we finally reaching the point where batch and interactive jobs are clearly separated because the very different tradeoffs they need. http://www.winestockwebdesign.com/Essays/Eternal_Mainframe.h... indeed.

sievebrain · on July 6, 2016

It's not perfect. Azul's C4 does a lot of work in read barriers, so code that looks intuitively like it should be fast can end up causing "read storms" that bog the code down.

C4 never pauses, and that's impressive. But there's no free lunch. The work the GC would do when the app is paused is sometimes being simply done by the app threads instead.

BenoitP · on July 6, 2016

I heard the read storm problem has been solved by Shenandoah, a GC developed by Christine Flood (who was in the original GC G1 team at Sun in 2001). It is under the RedHat umbrella and should be merged in OpenJDK [1].

Shenandoah uses a forwarding pointer in each object, adding overhead but limiting the problem only to write barriers. Here is Christine commenting on Azul vs Shenandoah [2]

From the talk: average pause is 6-7ms, max is 15ms, and the talk is one year old.

She hints at further developments in a version 2 which would make it entirely pauseless.

She has made another talk at RedHat's DevNation conference a few days ago, but they just won't put the video online arg!

[1] http://openjdk.java.net/jeps/189

[2] https://youtu.be/4d9-FQZZoVA?t=13m11s

astrange · on July 6, 2016

Does Java still have a word in every object allocation for locking it? Adding a 64-bit pointer sounds terribly inefficient.

Did you know Objective-C does locks and retain counting without allocating any extra fields in objects?

sbanach · on July 6, 2016

On hotspot: There are two bits in the header of every object. This is enough for an object that's never been used as a contended lock, CAS operations on the header can be used to handle the locking and that's that. As soon as you actually block on it, a 'real' lock is created (you can't get around the need for a list of threads to wake up as the lock is released) and the header grows to accomodate it. The process is called 'monitor inflation'. At a later date this might be cleaned up by a 'monitor deflation'.

jdmichal · on July 6, 2016

There's a certain amount of work that has to be done for GC, and that work is going to be done somewhere. The question is just what trade-offs you want.

Don't want compacting? You'll pay for it in allocation.

Don't want pausing? You'll pay for it in application threads.

pcwalton · on July 6, 2016

Precisely, and this is what is so often missed in these discussions. Most of the time, when you see claims of GC silver bullets, there's some hidden downside that's being papered over. Latency wins (i.e. "max pause time" or whatnot) tend to be throughput losses. Less copying results in more fragmentation. Value types can result in more copying, reducing performance over pointer indirections through nursery allocations. And so forth.

weberc2 · on July 6, 2016

I don't know that anyone disputes this. The discussions I participate in don't deny this; they mostly talk about whether or not the tradeoffs result in a net gain (if you sacrifice a little from the minority of cases to gain the same amount in the majority of cases, you do indeed have a net gain).

> Value types can result in more copying, reducing performance over pointer indirections through nursery allocations

Having value types means you can pass by copy, but it also means you can allocate on the stack and pass by reference--in other words, you get performant passing without involving the GC.

cliffc · on July 8, 2016

Nearly all the work you allude to is done in other threads - which indeed consume machine resources (CPU cycles, memory bandwidth). If your application does not burn all cores/bandwidth then the GC work is all done on the idle/spare machine resources. At the limit though, indeed you'll have to slow down the Application so that GC can play catchup - and bulk/batch stop-the-world style GC's require less overhead than the background incremental GCs Cliff

needusername · on July 6, 2016

> C4 never pauses

To my knowledge this is false. AFAIK while the C4 algorithm is pauseless the C4 implementation is not. It's just that the pauses are really short.

sievebrain · on July 7, 2016

My understanding: C4 does not pause but the JVM still does for various other reasons and a part of the work on Zing has been forcing down those pause times too.

cliffc · on July 8, 2016

Sorta kinda all of the above. C4 the algo has no pauses, but individual threads stop to crawl their own stacks. i.e., threads stop doing mutator work, but only 1-by-1, and only for a very short partial self-stack crawl. C4 the impl I believe now has no pauses also. HotSpot the JVM has pauses, and yes much Zing was on forcing these pause times down. Cliff

jstimpfle · on July 6, 2016

Isn't it also because Java and/or having GC makes certain patterns easy although they should be hard? At least my limited experience with Java, from writing a system which dealt with millions of integers, is that Java really wants you to use ArrayList<Integer> instead of the GC-friendly int[].

Skinney · on July 6, 2016

The problem with Java is that most things are a pointer, which means the GC has to deal with it. Go on the other hand allows the user to specify which things should be a value and what should be a pointer, which significantly decreases stress on the GC.

C# has something called value types, and while this helps (and Java is working on implementing something similar for Java 10) it's not as flexible as Go, where users can decide this at whim instead of specifying it in the type.

sievebrain · on July 6, 2016

Java doesn't "want" you to use ArrayList<Integer>, that's merely more convenient if you need a dynamically sized array and don't want to do the resize yourself.

But the JVM folks are adding support for ArrayList<int> to the language, with the efficiency you'd expect from it.

tmd83 · on July 6, 2016

Value types would hopefully get a big help in terms of getting the JVM GC better once the SDK and popular libraries full utilize it, both by reducing memory pressure and making the heap more GC friendly.

cliffc · on July 8, 2016

Fun stuff I've been doing with the H2O project is basically using nearly-pure Java (some Unsafe) to hold onto numbers with better efficiency than e.g. int[], and giving out an easy-enough-to-use API for writing parallel & distributed code over an Array-like API. i.e., feels "almost like an array", and "for-loops" run at memory bandwidth speeds and also parallel and distributed. Cliff

cliffc · on July 8, 2016

And the actual data is stored in giant byte[] (hidden behind the API), so the GC costs are near zero. Cliff

vardump · on July 6, 2016

Are the number of garbage objects generated by idiomatic Go and idiomatic Java comparable?

My guess is Go implementation will produce an order of magnitude less garbage.

In Go, an array of structs (= objects) is just one object.

In Java, an array of objects is array object itself plus one object for each value in the array. Except for elementary types, like bool, int, long, etc.

BinaryIdiot · on July 6, 2016

I completely agree and with so much tuning required in past frameworks for their GC it makes me wonder why more don't simply adopt the C++ / Rust models of resource management.

I remember way back when people said you couldn't use the JVM for real time applications because of the GC pauses but it's been improved significantly since then and now all the same topics are coming up with GO.

Skinney · on July 6, 2016

Because the C++/Rust way of memory management is better for some things, but worse for others. I've worked with several different projects during my career, and not once did we require manual memory management. A GC based language was simpler for us to use, and the few times we had problems with GC, they were possible to overcome by writing better code, as is the case with any language.

This is not to say that no project benefits heavily form C++/Rust. But I would argue that for many, GC is the best trade off.

unscaled · on July 7, 2016

I completely agree that explicit memory management (I wouldn't call it manual) in the C++/Rust way is a cognitive overhead you don't want for a great deal of the software work - perhaps most of it.

But there are definitely projects that require explicit memory management, and it's not just games and realtime software. Often high-performance backend code in Java and Go just end up using object pools instead of reallocating objects, just as the OP described.

With Go specifically we've seen the rise of fasthttp, which just adds completely manual memory management in the 90's C++ fashion. Want to create a new request object?

req := AcquireRequest() req.DoSomething() ReleaseRequest(req)

Compare to C++98:

Request* req = new Request(); req->DoSomething(); delete req;

And now you're back at the same manual memory management problem modern C++ and Rust are striving to solve.

kasey_junk · on July 7, 2016

I'd argue that one of the biggest differences between golang and java is not technical but cultural. That is, the golang idioms and thus the std libs are quicker to reach for things like object pools and other performance "hacks". Even the std http library uses an arena in golang.

Similarly, high performance Java libraries like the Disruptor, SBE or Chronicle look very much like C code.

Personally, that doesn't bother me, as it allows you to write your hot path and your non-optimized path in the same language with the same tooling.

says the guy who has split JVMs across processes for performance and contemplated doing it per core

Skinney · on July 7, 2016

For what, I would assume, is a minor portion of your total lines of code.

BinaryIdiot · on July 6, 2016

I'm not sure where you got manual memory management from. I was strictly referring to RAII. Manual memory management is such a pain but the C++ and Rust ways are very similar with RAII.

Skinney · on July 7, 2016

I usually refer to everything that isn't GC as 'manual'. RAII is an abstraction on top of manual management, you still need to decide what type of pointer/lifetime the allocated object should have, making it manually managed, IMHO.

RAII of course deals with more than just memory, but in a thread about GC I assumed it was memory management you referred to.

coldtea · on July 7, 2016

>Maybe it's all worth it and this is how developers are supposed to spend their time, but it's no longer interesting to me.

Unless you design and implement GCs yourself, it's not supposed to be interesting to you anyway. It's just something that will benefit users of the language, not something to excite them.

topspin · on July 7, 2016

If it's not supposed to be interesting, why do so many that have found themselves running up against the limits of GC in their chosen language/platform/implementation end up writing epic blog posts that represent months or years of work and/or presenting their hard one solutions at conferences? There certainly seems to be a lot of people that end up having to be very interested in solving their GC problems once their systems grow to non-trivial size, and they all seem to be relearning and resolving the same set of problems.

coldtea · on July 9, 2016

>If it's not supposed to be interesting, why do so many that have found themselves running up against the limits of GC in their chosen language/platform/implementation end up writing epic blog posts that represent months or years of work and/or presenting their hard one solutions at conferences?

Because they care about improving actual, existing, languages, with actual, existing, ecosystems, not doing cutting edge academic memory management research.

>and they all seem to be relearning and resolving the same set of problems.

So like architects relearn and resolve the same problems, about building bridges, skyscrapers, condos etc -- instead of designing some new structures to replace them?

loeg · on July 6, 2016

Go is Java 2.0, but from Google instead of Sun. At least the syntax is a little nicer and less boilerplated.

fishnchips · on July 6, 2016

I sincerely hope Go does not go in that direction. The 'less is more' approach is so far very strong among the Go steering committee.

Gankro · on July 6, 2016

The cruel irony here is that simplicity was a foundational goal and major rallying point for Java. Here's a website from 1997 describing it: http://www.cafeaulait.org/course/week1/16.html

Ignoring the last part that obsesses over the glory of OOP, replacing Java with Go in that page is... pretty spookily familiar!

fishnchips · on July 7, 2016

Maybe it's a bit too early to judge but IMO Go has not introduced any new language complexity since its public launch.

spriggan3 · on July 6, 2016

> I sincerely hope Go does not go in that direction. The 'less is more' approach is so far very strong among the Go steering committee.

There is no "less is more" approach in Go. It's more like you can't write something really complex in Go so people use it for trivial things like servers that do almost nothing aside from de-serializing JSON. Try write a large LOB app in pure Go or a fully featured CRM. And see if you can get away with "less is more" when you need to reason about complex business rules, data validation, complex routing, mapping RDBMS data to values, and what not. "less is more" is a mirage. Go short comings will show up pretty fast.

fishnchips · on July 6, 2016

All you're really saying is that Go is not great for web apps. I admit, it is not. So is C. I would not write a CRM in either of these languages.

At codebeat (codebeat.co) we use Go for our backend - very CPU-heavy, complex static analysis workflows. Our frontend is in Rails which is not ideal but probably the best bang for the buck for an early stage startup. This is the beauty of having many tools to choose from.

spriggan3 · on July 6, 2016

> All you're really saying is that Go is not great for web apps. I admit, it is not. So is C.

C is 30 years old, so it has an excuse. Go has none. The fact that it's extremely difficult to write a classic, complex webapp in Go is a proof that this language has serious flaws.

fishnchips · on July 6, 2016

It is just as easy as doing it in Rust or Swift - both being thoroughly modern languages. It's more about the lack of comprehensive frameworks than the language being somehow flawed. I used Go as a backend for mobile apps and it was OK but where it really shines is the kind of workload we do when analysing source code: where you need excellent performance and low memory footprint, all that while keeping the code readable.

koffiezet · on July 8, 2016

> It's more like you can't write something really complex in Go

Have you recently checked out the bigger projects that are currently being written in Go? You'd be surprised...

> Try write a large LOB app in pure Go or a fully featured CRM.

Woah. Have you tried doing that in C, C++ or Rust? Has anybody? Every language has it's strengths and weaknesses. Sure it's possible to do so in them - but is it a good idea? Not necessarily. I'm not going to write a database engine in Python - but we have timeseries databases being written in Go.

> Go short comings will show up pretty fast.

Every language has shortcomings. Go's major thing seen as a shortcoming is the classic "lack of generics", which arguably is true to some extend - but not it's GC. The thing is - Go's strong points have become clear long before these shortcomings you're talking about. The entire ops-space jumped on it because it solved a few problems plaguing their tools: memory overhead, slowness, dependencies, hard to make portable. Pretty much every major new project related to infrastructure is created in Go.

One of the biggest attractions of Go is it's ability to create programs that perform a lot better than the same thing written in Ruby or Python, which then again allows developers to undertake more ambitious projects.

buddhu · on July 6, 2016

So does that mean Go will finally have generics somewhere around version 5?

fishnchips · on July 6, 2016

Fingers crossed for the Hindley–Milner type system.

egeozcan · on July 7, 2016

Even maybe algebraic data types? One can keep dreaming.

karma_vaccum123 · on July 6, 2016

Then we might be in some alternate reality talking about how Twitch could never deliver a viable service because the developers kept creating segfaults. C is the last language I would choose in a race to a viable service.

coldtea · on July 7, 2016

>C is the last language I would choose in a race to a viable service.

And yet, the majority of services the world relies on everyday, from OSes and drivers, to Google search, NASA code, medical devices, etc are written in C/C++.

joejev · on July 6, 2016

Too bad no one can deploy linux servers because of all the segfaults.

serge2k · on July 6, 2016

I'm flabbergasted that people are able to post to this thread at all, what with all the segfaults their C++ based browsers must be experiencing all the time.

tracker1 · on July 7, 2016

I experienced quite a few kernel panics in the earlier days... Generally from poor drivers, but just the same.

jjnoakes · on July 6, 2016

At scale it may be more efficient to write the system in a higher level language (saving time) and then spending some time tuning only the slowest parts, instead of building everything from the start to be highly optimized, even the parts which may not need it (investing time where it may not produce results).

And the improvements to Go that they drove will help everyone.

I happen to prefer C, but I understand why they did it the way they did it.

daenney · on July 6, 2016

Better in what way? Performance wise, perhaps but that's only one aspect of why someone might pick one language over another.

Even distilling better down to just the max throughput you can get for a solution in a language vs another is hard to do as a lot also depends on how the code ends up being written and how easy you want to be able to debug that solution. You can solve this stuff in C many ways with different performance characteristics.

_ondq · on July 6, 2016

In some ways it's accurate to think of Go as a more convenient version of C with modern facilities like automatic memory management, concurrency primitives and data structures (i.e. maps), with the minimal level of runtime scaffolding included to support them. Interop with C is very easy, and Go is miles away (stylistically) from some of the more esoteric and abstract languages that are used these days.

jbooth · on July 6, 2016

Interop with C is easy to code but deceptively expensive from a machine standpoint, due to correctness guarantees when you're switching from green threads to no-green-threads. It involves interacting with the go scheduler and possibly/probably blowing your cache and TLBs.

And the automatic memory management is great, but the above commenter was saying that if you're going to extreme lengths to work around the automatic memory management, maybe you needed a non-GC language in the first place.

Keyframe · on July 6, 2016

Which ones though? How about Rust? To me, as a C programmer, Go looked like the closest one to my taste, but not enough or blitz to switch/consider.

weberc2 · on July 6, 2016

C is not a very nice language for concurrent programming.

anonymoushn · on July 6, 2016

If you want to hire 300 people to write reliable software in a language they don't know yet, Go is a good choice. You might also have like half a dozen people who are so deep into Go that they do the stuff in this post.

coldtea · on July 7, 2016

>Go is a good choice

Until you ask them to do stuff with channels -- where Go offers 100s of subtle ways to shoot yourself in the foot.

ben_jones · on July 6, 2016

This may be anecdotal but Twitch is an example of a service that just bloody works. I've been a user for awhile and I've yet to notice any noticeable service disruptions or issues. They were also one of the largest early adopters of EmberJS, pretty sure it was well before the 1.0 release when many bugs were still being worked out and the API suffered frequent changes, so hats off to the engineering team for continued awesome work.

r1ch · on July 6, 2016

Twitch has a fairly high number of outages, although not all affect video playback (eg API outages). Most recently the whole site was down for about an hour from EU due to a botched CDN setup. I have a status tracker that monitors from four locations, https://twitter.com/TwitchStatus

srpablo · on July 6, 2016

I'll always be a little bitter about their VODpocalypse and retroactively muting streams with copyrighted music.

Software should serve people and they eradicated countless memories/achievements, eliminated a priceless historical record.

I don't mean to diminish how untenable the previous situation was, and I'm sure I'm underestimating the difficulty/cost of what they ended up doing. I appreciate their work, engineering, and use the service regularly. But it's an "Our Incredible Journey" part of their story and I don't want to let them off the narrative hook for it. They made ~$1b on this content, after all.

hdra · on July 6, 2016

Maybe if you have fast internet connection. I'm on a HSDPA+ connection and Twitch is unusable, not even the VODs. Then again, Youtube and Vimeo is pretty much the only sites from which I can watch video streams smoothly.

Dobbs · on July 6, 2016

Mind sending me details? I can be reached at tarrant@twitch.tv. Information that can help are things like who your ISP is, what your specific IP is, where you are located. Possibly a traceroute to live.twitch.tv.

We really do care about the QOS of users and are constantly working to improve service to users everywhere in the world. Sadly there are many constraints out side of our control that can cause bad service. The information our users are kind enough to provide to us can often help us identify problems and reduce issues.

predakanga · on July 6, 2016

I'm curious, have you tried viewing the streams or VODs through an alternate player?

I've found that I can almost always improve the smoothness of their content by using Livestreamer[0] to play it in VLC (or Kodi, more often)

[0]: http://docs.livestreamer.io

83457 · on July 7, 2016

I use this on my netbook. Pretty much unwatchable otherwise

nindalf · on July 6, 2016

What's your location? I've found the same - I can stream HD on Youtube gaming without an issue but not anything better than Twitch Medium. Sometimes I need to downgrade to low or mobile quality to keep up. I figured it was because of my location (not EU or NA).

lossolo · on July 6, 2016

It doesn't have anything to do with the topic. It's quality of their network or quality of yours that is the main reason here. Not software.

asdf1234 · on July 6, 2016

The video service almost always works. The website has issues pretty regularly.

doodpants · on July 6, 2016

I've never successfully watched a video on Twitch. I get a black rectangle with playback controls, and when I press play, nothing happens. Disabling AdBlock doesn't help.

dylz · on July 6, 2016

You have flash allowed on all domains without click to play?

doodpants · on July 7, 2016

I don't have Flash installed at all. Does the site require Flash?

If so, then that's news to me; other video sites that require Flash usually a) show a "you need Flash" message in the place where the video whould show, and b) don't show playback controls, because they're part of the Flash component itself. Also, I never saw any mention of Flash in any of the site's help/troubleshooting/FAQ documents.

dylz · on July 8, 2016

It requires Flash, but it uses apple HLS.

I think Safari (+iOS) works without Flash, but everyone else is relegated to the Flash player.

Controls arein HTML5, just the actual video handler appears to be flash.

In the past (probably 2? years ago) the entire player including controls was part of flash.

doodpants · on July 12, 2016

Good to know, thanks.

Since posting my last message, I looked at the documentation on the website again, and saw that it claimed that you could use the site on iOS and Android by just using an ordinary browser. So I tried visiting it in Safari on my iPhone, and the videos worked. Then I tried using Firefox on my Kindle Fire, and the videos didn't work. But they do have a dedicated Twitch app for the Kindle, so I downloaded that. So now I have a way of watching Twitch videos. :-)

The fact that Flash is not mentioned anywhere on the site as being a requirement seems like a glaring omission.

dylz · on July 14, 2016

"ordinary browser" = Chrome for Android, latest version, Safari for Mac, Safari for iOS

Firefox does not support HLS in either desktop or mobile version. It requires Flash as a last-ditch fallback on any platform that doesn't support HLS, afaik.

anonymousDan · on July 6, 2016

So how does the GC performance of Go compare to something like Java/the Hotspot JVM?

_ph_ · on July 6, 2016

The approaches to GC are difficult to compare, and Java offers a selection of garbage collectors. Overall, the Java collectors are very sophisticated and tuned over years, so in principle are excellent. The downside is, that the Java language itself puts a lot of stress on the GC. The biggest problem is, that Java offers no "value" types beyond the builtin int, double,... So everything else has to be allocated as a separate object and pointed to via references. The GC then has to trace all these references, which takes time. While a collection of the youngest generation in Java is extremely fast, a global GC can take quite some time.

Go on the other side has structs as values, so the memory layout is much easier for the GC. Go always performs full GCs, but mostly running in parallel with the application, a GC cycle only requires a stop-the-world phase of a few milliseconds (for multi gigabyte heaps).

All these numbers of course depend a lot on what your application is doing, but overall Go seems to be doing very well with its newest iterations of the GC.

_0w8t · on July 6, 2016

Another problem with Java is inability to return multiple values. For that one often creates a wrapper object holding the results. JVM can recognize this pattern and stack allocate those wrapper objects, but it does not happen always increasing GC pressure.

pron · on July 6, 2016

The lack of custom value types has ramifications not only for GC, but for cache behavior. Which is why there's serious work on custom value types for Java; it's the major feature planned for Java 10.

Of course, most of the old-gen GC work in G1 is also done in parallel with the application, too.

needusername · on July 7, 2016

> Of course, most of the old-gen GC work in G1 is also done in parallel with the application, too.

Did you want to write concurrently? If so that would be wrong because evacuation can't be done concurrently with the application in G1, only initial marking.

pron · on July 7, 2016

I didn't say that all work is done concurrently with the app. How much work needs to be done in the STW phase is application-dependent. It is likely that if the application exhibits a transactional behavior, namely that objects are created in the beginning of a transaction and are all reclaimed at the end, there's very little need compaction required, as entire regions are likely to be completely free.

Cyph0n · on July 6, 2016

Another strong point for the JVM is the availability of alternate implementations from several vendors. Is this possible for Go say in the future?

_ph_ · on July 6, 2016

It is certainly possible. There are already two Go implementations, the official one, and a gcc based one. And due to the fact that the whole Go implementation is available under BSD license, allows anyone without any license worries to fork a custom Go implementation.

lloeki · on July 6, 2016

You forgot GopherJS.

arcticbull · on July 6, 2016

There's also llgo.

sagichmal · on July 6, 2016

Many view this as an overall negative point, particularly for those who are tasked with running complex JVM applications without deep operational knowledge...

jerf · on July 6, 2016

This observation has fed into the Go team's design philosophy; they're doing their best to minimize the "knobs" the GC has, because tuning them is inevitably a black art. As far as I know, there's still just one right now, GOGC, documented in the third paragraph of https://golang.org/pkg/runtime/ .

sievebrain · on July 6, 2016

Yes, but HotSpot G1 is meant to be usable with only a single knob too (target pause time). Other knobs do exist, but only for unusual cases where you want to precisely control the GC's operation to work around some bad app/gc interaction, for instance. And Go lacking such knobs is probably not really a feature: it's not like the Java guys set out from the start saying "we will build a complicated and hard to configure GC". It's just that as you work through more and more customer cases, these knobs become valuable for the hard ones.

jerf · on July 6, 2016

The point is not that lacking knobs is a feature. The point is that the designers are well aware of the issues and they are explicitly making it a goal that knobs should be unnecessary. (Especially since it has had some knobs off and on in the various versions, as mentioned in the article.)

This is in contrast to something we've probably all done at one point or another, which is just to add a checkbox to avoid having an argument about what the behavior should be. They're committing to having the argument out instead of "just adding knobs".

They also have a track record of, for better or worse, just refusing to add knobs and telling you to either do without or use a different language. If you've got an intensely GC-based workload, I'd consider using something other than Go. (However, bear in mind what may be an intensely GC-based workload in Java may not be in Go, since Go has value types already.)

pcwalton · on July 6, 2016

HotSpot cares a lot about proper defaults too. I don't think that there's a significant philosophy difference between HotSpot and Go there. The philosophy difference is, as you say, that Go is opposed to adding configuration options, while HotSpot does have those options (per customers' requests).

tmd83 · on July 6, 2016

I have seen how G1 suppose to have this one flag but I often get a bad feeling about G1 (without really using it much). It seems it reduces average GC pauses but performs really bad in the really lower (CMS) range. One of thing that looked bad to me is that originally they believed that it can completely ignore the generational hypothesis and then had to brought that back when finding the performance bad. There are also other issues like cross region links that it doesn't handle well. It seems to me that they thought their regional idea was a silver bullet and now tweaking it all over the place. It is a nice GC probably but I don't think its really the GC to end all other GC like Oracle wants it to be.

nickpsecurity · on July 6, 2016

I get what they're doing but it's a false dichotomy and therefore wrong. The dichotomy being it has to be one, ultra-simple GC or what Java was doing.

https://stas-blogspot.blogspot.com.au/2011/07/most-complete-...

It's well-known after over a decade of research and deployments in GC's that certain styles match certain workloads better. So, multiple ones should be available. This can be a small number that are largely pre-built with sane settings. What's left to tune can likewise be small: pause time, max memory, or whatever. There can also be a default as in current Go that covers 95% of apps well. The result is that specific apps or libraries if they went that far can have GC well-suited to their requirements with about one page of HTML describing what those GC's do and how to choose them.

That's what they should do. It will be easy for them and developers. Nothing like JVM mess. Still avoids one-size-fits-all: longest-running, failed concept in IT. Meanwhile, I can't wait to see someone make a HW version of their GC like I've seen in LISP and RT-Java research. IT would be badass given the current metrics. Allow whole OS to be done memory managed like A2 Bluebottle Oberon without performance penalty.

indolering · on July 7, 2016

Can hardware accelerated GC be generalized enough to make it useful? Isn't that what killed previous efforts?

nickpsecurity · on July 8, 2016

Previous efforts got killed because the off-brand hardware, especially the CPU's, were never as fast and/or cheap as Intel/AMD. They also required new tooling and such most of the time. This happened to LISP machines and apparently Azul's Vega's as they're pushing SW solution these days. So, that's my guess.

Most general I saw was in a Scheme CPU where the designer put the GC in the memory subsystem. The Scheme CPU would just allocate and deallocate memory. The GC tracked what was still in use on its own in concurrent fashion. Like reference counting I think. Eventually, it would delete what wasn't needed. Pretty cool stuff.

bad_user · on July 6, 2016

I don't see how it can be a negative. The availability of multiple vendors has given us commercial solutions tuned for particular needs.

For example Azul's C4 garbage collector which they claim is pauseless: https://www.azul.com/resources/azul-technology/azul-c4-garba... ; a pauseless GC is great if you want to tackle real-time systems. For real-time systems actually most garbage collected platforms are unsuitable.

But even more problematic is that stop-the-world latency is directly proportional to the size of the heap memory and today's mainstream garbage collectors cannot cope with more than 4 GB of heap memory without introducing serious latency that's measured in seconds. Think about that for a second - with most GC implementations you cannot have a process that can use 20 GB of RAM, which is pretty cheap these days btw. So keeping a lot of data in memory, like databases are doing, is not feasible with a garbage collector.

sedachv · on July 6, 2016

> For example Azul's C4 garbage collector which they claim is pauseless: https://www.azul.com/resources/azul-technology/azul-c4-garba.... ; a pauseless GC is great if you want to tackle real-time systems. For real-time systems actually most garbage collected platforms are unsuitable.

As far as I can tell Azul's collector claims to be pauseless because they use the new x86 nested page tables (https://en.wikipedia.org/wiki/Second_Level_Address_Translati...) to implement a read barrier (interesting aside: this means it should be possible to implement a read barrier on CPUs without nested page tables by moving the GC into the kernel). Here is an interesting discussion: http://stackoverflow.com/questions/4491260/explanation-of-az...

That still does not mean that C4 is necessarily real-time. You have to take a fundamentally different approach to GC to guarantee real-time bounds (see these papers on the Metronome collector: http://researcher.watson.ibm.com/researcher/files/us-bacon/B... https://www.cs.purdue.edu/homes/hosking/690M/ft_gateway.cfm....) and that comes with a restriction that ties your program's allocation rate to the scheduling of the GC. I am still skeptical about this - it is easy to imagine coming up with an adversarial allocation pattern that breaks time bound guarantees because of some detail of the GC implementation, so both the algorithm and every implementation will need proofs.

> So keeping a lot of data in memory, like databases are doing, is not feasible with a garbage collector.

It is very feasible if you do not make garbage. Either mmap some memory that the GC won't touch or pre-allocate large arrays of primitive types.

Cyph0n · on July 6, 2016

You also have Excelsior[0], which provides full Java AOT (ahead-of-time) compilation to native machine code.

[0]: https://www.excelsiorjet.com/

rogerdpack · on July 6, 2016

Is the output code faster than HotSpot after its been warmed up?

sievebrain · on July 6, 2016

Eh, HotSpot can handle heaps of hundreds of gigabytes with pause times in the 100msec range. It takes a bit of tuning but can be done with the basic open source code.

bad_user · on July 6, 2016

What HotSpot are you talking about? I assume you aren't talking about the Serial, or the Parallel GC or about CMS, which are the older generation, but about G1, right?

Well, I have extensive experience with tuning G1. G1 is a good GC, capable of low latency incremental pauses.

The problem is that with a stressed process, at some point G1 still falls back to a full freeze-the-world mark-and-sweep. For 50 GB I've seen the pause last for over 2 minutes !!!

mioelnir · on July 7, 2016

2 minutes is cute. If you stress a CMS setup hard enough that the young generation is completely full, it will allocate directly in the old generation. This of course screws the full gc heuristic totally, up to the point where the GC is started too late and you fully run out of memory. At which point the JVM drops down to a single threaded oldschool serial GC as last line of defense. On a 96GiB heap, that thing can take hours; all stuck 100% on a single cpu with even signal handling suspended. Fun times.

That said, for heaps above 32ish GiB, we still go with our tuned CMS settings and overcommit one or two additional memory modules. It's a lot cheaper than the time it takes trying to tune in G1 on a large heap with a lot of gc pressure.

rbjorklin · on July 6, 2016

Got any sources for that? I would be very interested in these tuning parameters and some explanation of what they do! :)

coredog64 · on July 6, 2016

Cassandra works around this by pushing some of that responsibility onto the OS's disk-caching mechanism.

Thaxll · on July 6, 2016

No it won't, I asked and got an answer from Brad Fitzpatrick:

https://www.reddit.com/r/golang/comments/4pmlv9/transaction_...

_ph_ · on July 6, 2016

The link you posted was about switchable GCs in the official Go runtime, which won't be there, but the question was whether there are multiple Go implementations.

needusername · on July 6, 2016

Is there any data from production systems available that confirms that is an issue in most/many real world applications (the lack of value types)? From the allocation profiles I have seen in the applications I have seen most allocations in Java programs seem to be from strings, often in logging, or byte array buffers. Value types would no help here but compressed strings would.

nvarsj · on July 6, 2016

A significant drawback of the hotspot JVM is the amount of memory required for even simple apps. At least 64Mi for the most simple, and typically much higher. A typical web app with a 1000 request threads will use something like 1.5Gi of memory (512Mi heap, 1Gi for thread stacks, classes, etc).

Golang apps tend to happily run with less than 100Mi, so are well suited as daemon processes that don't get in the way.

However if you need to support a large amount of dynamic state (> 1Gi), the hotspot GC is very difficult to beat.

sievebrain · on July 6, 2016

Memory usage in Java can be misleading. Some versions of the JVM will happily take ALL your free RAM if it thinks it's sitting there unused because there's a RAM/CPU tradeoff in garbage collected systems: the less frequently you GC the less CPU time you burn and the faster the app runs.

If your machine actually does have gobs of free RAM, it therefore makes sense for Java to use all of it.

If your machine has gobs of free RAM you were planning on using for something else after your Java app started, well, that's something the JVM couldn't know. Some versions (on Windows?) monitor free memory and adjust down its own usage if you seem to be consuming the headroom, but on other platforms, you just have to tell Java it's got a limit and can't go beyond it.

102030485868 · on July 7, 2016

Unless you specifically tell the JVM otherwise using the -Xmx flag.

See: https://docs.oracle.com/cd/E13150_01/jrockit_jvm/jrockit/jrd...

geodel · on July 6, 2016

Technically hotspot GC might do more work in same amount of time but Go's GC makes some performance guarantees like <10ms STW phase which hotspot do not claim or offers for large heaps.

pcwalton · on July 6, 2016

HotSpot does offer that. It's basic functionality that all incremental or concurrent garbage collectors offer. You can adjust the max pause time with -XX:MaxGCPauseMillis.

pas · on July 8, 2016

That's a target. (GC ergonomics)

It won't guarantee it, just tires to size things (eden space, survivor spaces) and time things to meet its target.

But it's a fickle beast. And usually it requires a lot of tinkering with the code for it to be able to meet it. And then it's easier to disable ergonomics, set fixed sizes, and just enjoy how blazingly fast CMS is, restart the app every few weeks (CMS heap fragmentation), and try G1 with every new point release, maybe finally it beats CMS.

sievebrain · on July 6, 2016

Yes, but that's because the Go GC doesn't compact, and nobody quotes throughput numbers. Building a slow GC that leaves the heap fragmented but has short pauses is not that difficult indeed, the tricky part starts when you wish to combine all these things together.

geodel · on July 6, 2016

Of course Java has all technical bullet points checked and may be superior GC from strictly that point of view. But Go has 2 things from users' perspective upfront which Java lacks.

1. It uses about an order of magnitude less memory than Java.

2. It openly proclaims <10ms STW pauses for GC.

mdasen · on July 6, 2016

Go definitely does not use "an order of magnitude less memory than Java". That would mean that a Go program that uses 1GB of memory would need 10GB in Java.

Google wrote a paper comparing C++, Java, Scala, and Go and definitely did not find that (https://days2011.scala-lang.org/sites/days2011/files/ws3-1-H...).

I like Go and it has many wonderful qualities. Still, it's important to be realistic.

richard_todd · on July 6, 2016

I think for "small-data" programs it does work out to about an order of magnitude of overhead in Java. I have ported several small Java programs to Go and I see it (like 100MB Java vs 8MB Go). One encryption program I coded multiple versions of ran 350k C vs. 1.3MB Go vs. 16MB Java.

Programs holding GBs of data in arrays would look much closer, though, I imagine, as the overhead would be dwarfed by the data itself.

geodel · on July 6, 2016

Possible. But typically idiomatic Java usage patterns with collection types have huge overheads. So unless Java code is written in specially memory efficient way that memory usage gap should remain

https://www.cs.virginia.edu/kim/publicity/pldi09tutorials/me...