Hacker News new | past | comments | ask | show | jobs | submit login
Go GC: Solving the Latency Problem in Go 1.5 (sourcegraph.com)
179 points by beyang on July 8, 2015 | hide | past | favorite | 128 comments



I've worked with garbage collected languages (ruby, Java, Objective-C in the bad old days), automatic reference counting (Objective-C and Swift), in the Rust model and manual reference counting/ownership in C over about 15 years now.

Having thought about this a lot, I just don't really understand why people continue to work on garbage collection. Non-deterministic lifecycle of objects/resources, non-deterministic pauses, huge complexity and significant memory and CPU overhead just aren't worth the benefits.

All you have to do with ARC in Swift and Objective-C is type 'weak' once in a while (which effectively builds a directed acyclic graph of strong references). With Rust you can get away with just structuring your code in accordance with their conventions.

I'm sure this won't resonate with everyone but I think it's time to walk away from GC. I'm curious, is there something I'm missing? The only true benefit I can think of is reducing heap fragmentation; and there must be a better way to address that.


When you consider that most applications are just CRUD processors for some business logic that very rarely hit performance issues, the use case becomes obvious. GC saves developer time, which is the most significant cost for anything that isn't large scale.

The clerk at the front desk isn't meaningfully impacted if the backed has to GC for 350ms once in a while. But the clerk is impacted when his/her software is missing a ton of features because it was written in a language that exposed too many implementation details and led to the development budget running out.


> the use case becomes obvious

It's only obvious if you assume that writing code without a GC is harder or takes longer.

Have you used a language that supports automatic memory management but without a GC for any length of time?

It's actually very nice.


No, why should I? In Go I've finally found a language that's pragmatic, both fast to run and fast to think and write in. While evreyone's discussing its obvious shortcomings, I'm churning out working code faster than ever. Now somebody who isn't me is putting in big resources into making the language even better. Where's the reason to complain?


The reason to complain for the other commenters is that you're criticizing something you've never used, while being familiar with only one method of doing things.

Your criticism is not legitimate.


Who said I've never dealt with malloc or ARC (which I think is ill-suited for concurrency)? I basically replied to the OC, giving some good reasons for developing better GC strategies for a language with great support, libraries and a vibrant ecosystem, rather than moving to the next shiny thing. We're developing a latency sensitive real time bidder and I'm more than happy we can do it in Go now, just like the rest of our APIs, thanks to Go 1.5


Why shouldn't you? One of the best parts of software development is the diversity of ideas. Learning a new way of doing things adds to your potential techniques.

My main job is in a garbage collected language but I'm learning rust and know swift.


Besides Rust, only Cyclone, ATS and ParaSail offer similar ways of memory management[0].

It would be nice if Rust becomes mainstream, but it is still a very long way to go.

[0] D is adding some support for it as well.


C++ also qualifies. std::unique_ptr and std::shared_ptr together form a very nice automatic memory management system.


C++ doesn't enforce you to use them, while the other languages have their memory models enforced by the compiler, hence why I left C++ out.

In large teams no one can prevent the cowboy coder on the team to go C style on a large C++ codebase, specially when code reviews and static analysis are not used.


If you don't want to, then don't. It's just generally a good idea to at least understand the alternatives before you proclaim one thing to be better than another.


The thing with ARC is that you mostly only have to use the weak keyword for delegates (and in certain data structures, etc); it becomes totally second nature and intuitive, you only have to think about it once in a blue moon. It's actually just as productive. And worst case with ARC (while admittedly unbounded) is you forget a weak keyword and you leak some memory.

The problem is that you're asserting it's significantly more burden on developers to use the 'weak' keyword once in a while (totally intuitive, second nature) but that GC doesn't have it's own issues. In my experience that's just not true.

In my experience getting hit once with a resource that just won't get deallocated for some reason because of a non-deterministic lifecycle is just as big a productivity hit to developers. Probably worse since they're so much harder to track down than using the static analyzer or running the 'leaks' tool.


ARC is pretty great, but it's still very possible to leak memory, and you can actually cause longer pauses when references go out of scope than well-tuned modern GCs can. So there's pluses and minuses, as in all things.


I think the advantage of ARC is not the potential for longer pause times but the fact that things are predictable & somewhat easier to debug.

I think it's too bad that the GC/ARC choice also implies a language choice. For e.g. if we had the ability to choose between GC & ARC in Java, we would be able to get a better understanding of which memory model developers of Java enterprise software end up preferring.



Very interesting, thanks for sharing.

Often, the problem with this kind of academic work is that they don't have a real world system to gather data and study different algorithms. I don't really care if javac (one of the application they benchmark) takes 10% more or less time. I want to know how much time engineers save when using ARC vs tracing GC over a long period of time (development + maintenance).


You loose the predictability if you get longer pause-times. Memory leaks can also severely hurt development time when you spend days or weeks chasing down memory leaks (personal experience). You're not immune to memory leaks (space leaks really) in a GC based system, but it's generally easier to reason about where the system is using memory.

In regards to resources, who depends on lifecycles in a GC based system to handle resources? try-with-resources in Java is just as predictable as destructors in C++, with the added benefit of resources being eventually closed in case you forget to close it (wheras a memory leak in C++ will never close said resource)


The "A" in ARC is a bit of an euphemism. You basically move the coding of your memory management into the type signatures, which don't write themselves automatically either. The amout of thoughts you have to make during development is the same.


I thought the A was for atomic, ARC is atomic reference count. RC is non atomic.



My mistake, in rust it is atomic, in objective c it is automatic.


For simple stateless request/responae services I doubt restricting yourself to non-gc techniques like ref counting or other technique for managing objects with simple life times shouldn't be to difficult either. The benefit you'd get for this is avoiding sporadic gc pauces messing up your latencies and resorting to really nasty gc tuning.


> I'm curious, is there something I'm missing?

Working in large teams with disparate skill levels and high attrition, e.g. typical corporate jobs.

Manual memory management never works, because there is always someone that does the wrong thing that leads either to blow ups or security exploits, that take experts days to weeks to track down.

ARC is one step up, but requires deep knowledge of the code, where to place weak annotations. Also hinders performance in thread heavy code and those cascade deletions of data structures are no better than a GC pause.

Yes, it is also possible to "leak" in GCs, but it is not a leak on C sense, and they are quite easy to find with something like VisualVM or MAT.

Lifecyle of resources is very deterministic in GC languages, provided one uses the proper language constructs, such as using/with/try/scope/defer or high order functions.

Regarding the languages you mention, Ruby GC are not on the same ballpark as Java or .NET, for example.

Objective-C's GC was never going to be any example of performance given the constraints of working with C, to the point it was an huge failure and triggered Apple to switch to ARC instead.

Java has plenty of GCs from several certified JVMs to choose from. Not all of them are the same. Also only with Java 7 and 8, it acquired the necessary mechanisms (try-with resources and lambdas) to have deterministic resource usage with a GC.


> Manual memory management never works, because there is always someone that does the wrong thing that leads either to blow ups or security exploits, that take experts days to weeks to track down.

This only applies to unsafe manual memory management. Not all manual memory management is unsafe.

> Lifecyle of resources is very deterministic in GC languages, provided one uses the proper language constructs, such as using/with/try/scope/defer or high order functions.

Only for objects that have lexically scoped lifetimes. It doesn't help with prompt reclamation of objects with dynamic lifetimes (e.g. a file that two threads are holding onto, which you want to close when both threads shut down).


> This only applies to unsafe manual memory management. Not all manual memory management is unsafe.

There is no such thing as safe manual memory management, by definition manual memory management is unsafe.

Assuming you are defending Rust here, if the compiler, e.g. a tool, is doing validation of memory management, it is no longer manual.

As such I don't consider compiler verified memory management as manual.

> ... file that two threads are holding onto, which you want to close when both threads shut down

Easy, use a region and close the file when all threads join it. Let the thread runtime help the GC manage the resources.

Similar solutions can be applied to other types of resources.


This is the one HN post that I agree with above all others. I also thought I was in a small majority holding this view. I recall being told 15 years ago that I was "a dinosaur" for being skeptical about GC. I'm now a black-belt JVM GC tuner but that has only served to confirm my feelings about GC. You're exchanging the solution of some albeit tricky problems for the acquisition new much worse problems (pauses, inability to reason about memory usage).


Why do you think GC makes you unable to reason about memory usage? It's certainly possible to write code where ownership/lifetimes are extremely unclear, and I've cursed code like that before, but that's not the same as making it impossible to reason about memory usage.


> pauses

pause-less GCs have been proven possible

> I recall being told 15 years ago that I was "a dinosaur" for being skeptical about GC.

15 years ago we neither had today's GCs nor today's alternative solutions. Whatever informed your gut feelings back then might not really be all that relevant today.

For all the praise Rust gets, its advanced ownership model is fairly new in anything that even remotely approaches a mainstream language.


Compared to GC the other options have several important disadvantages.

Rust style memory management:

- Additional complexity in the language and in your code

- You need unsafe blocks for code that does something interesting with pointers

- Can leak unreachable memory even in safe code

- Still has pauses due to deallocating large reference chains, or puts them on a deferred free list in which case you no longer have prompt deallocation

Refcounting with weak refs:

- Unsafe

- A little bit of extra work

- Can easily introduce memory leaks, especially with closures - Refcounts have overhead in time and memory

- High contention on the refcounts in multithreaded code, or can't share data among threads at all (or need hazard pointers or RCU)

- Still has pauses due to deallocating large reference chains, or puts them on a deferred free list in which case you no longer have prompt deallocation

Most code is actually fine with GC, and some code is significantly simpler with GC, such as lock free data structures.

Moreover some of the problems with GC are not fundamental, and can be fixed:

- Pause times can be much lower, e.g. realtime GC's and Azul GC

- The main problem with languages like Java is that they have pointers and heap allocation everywhere. If you used the same data in a refcounted language it would be horribly slow as well. In C# with support for value types this is a bit better but still not ideal.


> The only true benefit I can think of is reducing heap fragmentation

Unless something has changed recently in the literature, concurrent data structures, especially lock free ones are super difficult to get right without garbage collection. Further, lock free structures are one of the most straight forward paths to large non-contended concurrent in memory data sets becoming the norm rather than the exception.

I'm am by no means an expert on this topic so its entirely possible the state of the art has changed on this, but that is the obvious case I thought of when asking the question about missing something.


They are easier with GC, yes. But hazard pointers work reasonably well as a substitute.

(Note that you need generics for lock-free data structures to be ergonomic.)


... and I'll ask you the same question I always do: in that case, adding predictable, low latency arenas to GC languages is much easier than using hazard pointers (which are really a form of GC) to non-GC languages. Why should RC ever be the default?


> in that case, adding predictable, low latency arenas to GC languages is much easier than using hazard pointers (which are really a form of GC) to non-GC languages.

I disagree with that. I haven't seen "opt-out GC" work that well in practice. People use memory pools in GC'd environments to work around slow GCs, sure, but they're limited and have poor safety/ergonomics, as nobody wants to have to explicitly free data. They also don't really provide one of the greatest advantages of manual memory management, which is the ability to use stack allocation aggressively. (By "aggressively" I mean "in ways that an intraprocedural analysis with no knowledge of data lifetimes—i.e. an escape analysis—could not prove safe".)

Hazard pointers are a small feature, localized to exactly where you need them (concurrent data structures). Those who need concurrent data structures have to pay for the feature; those who don't don't have to. A pervasive GC, however, imposes its costs and benefits on the entire language ecosystem.


> I haven't seen "opt-out GC" work that well in practice.

RTSJ (realtime Java). I've personally used it in a missile-defense system and it works perfectly, with hard realtime guarantees.

As I've said before, in-memory data comes in four flavors: stack, arena, permanent and arbitrary. It's very easy to have all four in a GC language, as RTSJ does very effectively.

> A pervasive GC, however, imposes its costs and benefits on the entire language ecosystem.

I would say the same about RC. RC and GC are both really good for a specific environment (constrained and unconstrained RAM respectively) and pretty bad for the other. Adding arenas and permanent areas for GC environments is just as easy/natural as for RC/manual environments. And, again, you get RC + arenas which are great for constrained environments and pretty bad for unconstrained, and GC + arenas which are great for unconstrained and pretty bad for constrained.


I'm just as confused as you are about most implementations of garbage collection. There's one that feels like it works, though: Erlang's.

Erlang doesn't do anything different in its GCing algorithms or anything; it just gives each (really tiny) process its own heap, so there's lots of heaps, which can each be GCed at some opportune time when that particular (tiny) process isn't scheduled. It's stop-the-world GC, but with high-granularity worlds.

This seems to suggest, to me, that the "problem" of GC is not in its implementation algorithm. Generational, concurrent, real-time, whatever other clever properties: not important. A GC pass is problematic in proportion to the size of the heap. So, make smaller heaps. The easiest way is smaller processes.


> Generational, concurrent, real-time, whatever other clever properties: not important.

Those properties cut down the pause time in relation to the heap size. The more sophisticated your GC the larger your heap can be for the same pause time.

Until you reach the point where you have a fully-concurrent, pause-free GC.

Of course pause-free GCs aren't free lunch either. They incur increased memory traffic, require more CPU cycles per megabyte collected, require more headroom (i.e. larger memory footprint) to maintain their properties...


This works for erlang 'cuz it's functional. Not gunna work for something like Go.

Something like ARC would work for Go though (right?), & I agree it's a really nice approach, and I don't really get why it's not more widely used.


It works for Erlang because of its process model. I don't think it has anything to do with it being functional.


Yup, you're right. I just meant that to have a program composed of many processes that can be stopped & collected independently, those processes have to communicate 'functionally', i.e., without shared mutable memory. (So this isn't a viable approach for languages like Go where all the goroutines share memory.)


> So this isn't a viable approach for languages like Go where all the goroutines share memory.

Something Go has been criticised for since it was originally unveiled back in 2009.


In erlang old data structures can't reference new data. This simplifies GC a lot.


Immutability is orthogonal to FP as a whole.


If that works or not depends on the application. It doesn't work for large in-memory datasets that are acessed by multiple processes.


To be clear, modern GCs are truly impressive -- I just don't think that GCs are how we should be addressing memory management. They impress me in the way that, let's say, a Rube Goldberg machine is an impressive way to make toast. I respect the craftsmanship, the creativity, the ingenuity ^_^ but I'd never make toast that way.


Good luck making a functional language like clojure work without a gc.


Do you think ARC is a viable alternative? I'm uncertain how the performance compares to GC for most applications.


Totally! Or RAII, but that requires much more re-thinking of how your program is put together to do it well.

Performance depends on a lot of things. For instance, if you have 'unlimited' memory, GC will be way faster because most likely the collector will never run, and allocations are extremely fast since you can just hack off a slice and hand it out. But in that world, relying on something like a finalizer is impossible since your object will never be collected -- ever.

In the real world, memory isn't unbounded. Collection can happen any time and requires traversing large amounts of memory to see what is free and what isn't. If the heap is small, that's fine. If the heap is large, that problem can become truly enormous. Collection can happen at any time, and is totally non-deterministic.

On the other hand, ARC's simplest implementation inserts calls to increment and decrement the reference count on your behalf. Plenty of optimizations are possible -- i.e. if the compiler can statically validate the lifecycle of your object at compile time it can cut a lot of these out. And Apple does some amazing runtime magic, too. In return, your objects never outlive their scope, collection happens in-line, deterministically and predictably. Memory usage is much lower and no traversals of your object graph ever happen.


With ARC sharing objects across threads becomes trickier. First of all, in the general case you'll have to do atomic increments and decrements which tend to be fairly expensive, and they'll be sitting right in the middle your core application logic. Secondly, if you're sharing objects amongst threads, you cannot simply do:

  Thread1:
  Obj = Heap->field;
  Heap->field = null;
  // reduced a reference so:
  if (AtomicDecrement(Obj->refcount) == 0) {
    free(Obj);
  }
Since you'll be racing with

  Thread2:
  Obj = Heap->field;
  // stalls, and Thread1 deletes Obj
  AtomicIncrement(Obj->refcount)
The only satisfactory solution to this that I'm aware of is to use hazard pointers, and that is a fairly complex bit of logic. Maybe there's a better solution to this, but I've not come across one.


Two things you get wrong here:

1. Collection cannot happen at any time. It's not like the GC just decides to do a collection. It happens on allocation. If you don't allocate, you don't collect.

2. Most garbage collectors don't scan the entire heap. You don't scan dead objects. You could have the larges heap in the world, but if you, at collection time, only use 100mb worth of ram, you only ever scan 100mb. Copying collectors also gives you memory defragmentation for free, which ensures quick allocation times, something ARC cannot do.


I am convinced that GC can cause issues with object lifetimes and pauses (at least without a pauseless GC).

Like sanjoy points out though: https://news.ycombinator.com/item?id=9856234 RC could involve atomic increment/decrement. I haven't seen any macro benchmarks about ARC performance vs manual memory management and/or GC.

I've seen anecdotes like http://stackoverflow.com/questions/12527286/are-there-any-co... or microbenchmarks (from the same page) but nothing at a larger scale.


> I just don't really understand why people continue to work on garbage collection. > ... but I think it's time to walk away from GC

Well, look at the arguments. GC advantages:

  * less memory (unless you use a semi-space GC)
  * faster
  * can handle cyclic refs (graphs, not only trees and linked lists)
  * trivial to use, less programmer errors
What you got wrong: significant memory and CPU overhead.

If you compare the memory overhead and complexity of refcounts with malloc to a non-semi-space GC, you'll be surprised. malloc is more complex and worse in it's CPU overhead, refcounts in every data cell are a huge overhead. GC's got it better.

Where GC's have a memory overhead (copying collectors), they do it on purpose, on machines which do have enough memory. Those copying collectors are the fastest, but cannot be used on small devices. With not enough RAM, you just use a trivial Mark&Sweep, which is much more trivial than your malloc implementation and manual refcounts.

The only real field where you cannot use a GC is, when you cannot tolerate pauses, as in real-time, latency critical apps. But even there exist incremental GC's (like boehm) with real-time characteristics. And when you need immediate destruction of objects, when they go out of scope, and not when the GC decides to destroy them later on. This can be solved by the compiler, but usually isn't.

Of course people always walk away from GC and avoid it like a plague. GC people on the other hand feel memory is too important to be trusted to programmers. We had these discussion for decades.


From gamedev perspective:

Every time I've prototyped something on a GC'd language (java, actionscript, javascript, ..) I soon find myself optimizing memory allocations and not making the game because the GC causes too much overhead and/or causes too long pauses and therefore makes the game feel miserable. It doesn't matter if there are some theoretical GC models which have < 1ms pauses with gigabytes of heap data, as the current implementations don't seem to be anywhere near that.

Nowadays I prefer to use C not because I like malloc, but because it makes handling different memory allocation schemes trivial (unlike C++). Furthermore, in this semi-real-time case the performance of malloc is irrelevant because when you want to be fast and low-latency, you pre-allocate everything and use lightweight functions to further distribute memory from those blocks.

In conclusion, I've found that manual memory management lets me focus more on the task I have in hand while GC don't.


I bet it only works, because you work in small teams with highly skilled developers.


Most of the complexity of a good malloc (jemalloc/tcmalloc) is going to be in its thread-local caches and in its placement heuristics, both of which are also necessary for a good GC (for the tenured generation, in the latter case). Accounting necessary for the tenured generation is also comparable to accounting for a malloc implementation. Fragmentation also isn't much of an issue anymore with modern allocation schemes. When you add in the complexity and overhead of tri-color marking and write/read barriers, the overhead often isn't in GC's favor, compared to a well-tuned program that uses the stack where possible and a good malloc implementation for long-lived objects only.


Also, Azul Systems solved that issue - the proprietary C4 GC has no pauses, and the heap can be huge, like hundreds of GB. If that tech would become common place, maybe this discussion would be obsolete. But I think C4 requires kernel support, and the first attempt to get a patch accepted didn't go well.


To clarify, Azul's Zing does have pauses, but they optimized the crap out of them (the pauses are more time-to-safepoint rather than GC pauses). GC time wrt application stopped time is constant regardless of heap size.

(I'm an Azul customer and Zing user)


Thanks, the marketing implies its pause-less. What are typical application stop times?


My mean pauses are a few hundred micros. Standard deviation is slightly more (300-500 micros), with a max of a millisecond or two.


Azul also sacrifices throughput over the HotSpot GC to achieve very low pause times—the overhead of a GC is not just in its pause times! There's no free lunch.


YES, if you take all the pluses form multiple types of GC, it's gonna look awesome. In reality you have one GC; if you're lucky maybe your toolchain supports multiple and you can swap them.

So discussion about "GC" in the abstract are useless, discussion about the GC in Go 1.5 are absolutely worth it, and then you can find real-world stories about GC failures.


Can you point me to the source of your claim about memory overhead? The Go team specifically states their goals for the 1.5 garbage collector as follows:

"Hardware provisioning should allow for in-memory heap sizes twice as large as reachable memory and 25% of CPU cycles". http://llvm.cc/t/go-1-4-garbage-collection-plan-and-roadmap-...

As far as I know, that collector is not a copying collector. I also know that all JDK GCs have significant memory overhead (not sure about the Azul one).


They are still using a simple and slow GC, non-copying, tri-color M&S, but at least incremental.

So they need to scan the complete heap, while a good copying collector (e.g. a two-finger Cheney with forwarding pointers) would only need to scan the stack and some roots. My GC needs ~4ms on normal heap sizes, the fastest M&S GC's need ~150ms. But I haven't found a good version besides the Azul one, which works fine threaded.

Memory overhead: A non-copying GC has none. Lookup a GC book or explanation. Refcounts have plenty: 1 word per object. malloc has plenty for its free-list management, growing with the heap size.

A semi-space GC is different as it reserves for every heap segment a mirror segment. A fast version reserves max heap and divides it by 2, so can use max 2GB of 4GB. A normal version can do that incrementally.

Java has a huge memory overhead from the kernel and run-time alone, not so the GC, but since they have various swappable GC's you need space for that. You can write better and smaller run-times with GC which do run circles around Java, .NET, Go or Ruby. As I said mine needs ~4ms, the fastest java is ~20 - 150ms. v8 has a good one, but I don't know their stats out of my head.


>Memory overhead: A non-copying GC has none

Do you have any idea what could make the Go folks ask for "in-memory heap sizes twice as large as reachable memory"? This seems to be completely at odds with what you are saying.


Why? Reserving virtual memory has nothing to do with actual memory usage.


It would be great if they really mean virtual memory, but I very much doubt it as it makes no sense as a goal and is inconsistent with mentioning "Hardware provisioning".


I tried making a double-linked list in Rust the other day. I found it hard to do without resorting to 'unsafe' code. I don't think a datastructure like HAMT would be a walk in the park either.

The thing about GC is that it makes it easy to write code. That's the main benefit. I don't want to deal with weak/strong references or ownership, just write code. True, a GC has downsides like increased memory footprint and possibly long pause times, but for most of the software I'll ever write it will never be a problem.


You can write a doubly-linked list rather easily with `RefCell<Rc<..>>`/`RefCell<Weak<..>>`.


It's even easier in a GC based system.


ARC won't work when you cannot determine ahead of time where the cycle can end safely. Pretty sure for an interesting set of algorithms this is a problem.

It's something I find annoying about some ARC/Rust enthusiasts: their belief that because they haven't found need for a GC, that there isn't one.


That's why we have the 'weak' keyword ^_^ I'm confident any algorithm you want to design for a GC world can be implemented in an ARC world, maybe with a tiny bit of tweaking. It's not that there's 'no need for GC' its just that there are many ways to solve a problem.


Your "confidence" is wrong.

Consider a graph which can have nodes added, and edges added or removed. Say you want to keep track of the part of the graph that is connected to a node.

You can't know ahead of time which references can be weak. So all must be strong and you will leak memory when a circular part of the graph becomes disconnected.

In general, garbage collection is a difficult problem that solves a lot. You can't trivially do without it. It's not just a tool of laziness.


Obviously you could implement any algorithm using reference counting instead of garbage collection. The question is whether or not you really want to pepper your code with what is essentially manual memory management when necessary.


ARC languages are Turing complete, but some algorithms require more than just "peppering weakrefs" and actually require you to essentially build your own GC.


Modern allocators like jemalloc are very impressive at reducing heap fragmentation. In the non-moving-allocator world this is now almost a solved problem.


Persistent data structures with structural sharing (eg, Clojure's data structures) benefit substantially from GC.


One thing that's rather difficult to implement without GCs are concurrent, wait-free data structures.

When you have multiple threads trying to unlink structural elements they have to back off and leave the work to other threads, otherwise you will get contention on the pointers which breaks the wait-free guarantee.

With a GC you can just null out a reference. Or let another thread do the nulling.

With manual memory management you can only call free once. With ownership a thread cannot back off because it has taken ownership for that chunk of the data structure. With reference counting you either have contention from the count fields or vastly inflated footprint for striped counters.


> non-deterministic pauses

Modern GC can provide deterministic pauses (see Azul's pauselss GC, and various real-time Java GCs).

> I'm curious, is there something I'm missing?

I think so. Consider modern servers with plenty of cores and plenty of RAM. The only way to make efficient use of that RAM is to store as much program data in it as possible, and the only way to provide really efficient access to all that data is without forcing sharding, and the only way to do that is with GCs.


GC is a lot faster throughput wise for programs that create a lot of short lived objects. ARC will call malloc and free for each object, which is actually fairly expensive. A good GC will bump a pointer for each object and do nothing for each one that dies before the next GC cycle. Java programs allocate objects at rates that the host malloc simply could not sustain in many cases.


> ...there has been a virtuous cycle between software and hardware development. CPU hardware improves, which enables faster software to be written, which in turn...

This is the exact opposite of the experience I've had with (most) software. A new CPU with a higher clock speed makes existing software faster, but most new software written for the new CPU will burn all of the extra CPU cycles on more layers of abstraction or poorly written code until it runs at the same speed that old code ran on the old CPU. I'm impressed that hardware designers and compiler authors can do their jobs well enough to make this sort of bloated software (e.g. multiple gigabytes for a word processor or image editor) succeed in spite of itself.

There are of course CPU advancements that make a huge performance difference when used properly (e.g. SSE, multiple cores in consumer machines) and some applications will use them to great effect, but these seem to be few and far between.


It's probably fair to call that a reasonable characterization of the Go's "home problem domain", though. Contrary to popular belief that "cloud" means you can just write crappy code and throw cheap hardware at it, when you are truly working at cloud scale you actually try to write software as lean and mean as possible, because everybody cares about 10x and 100x differences in the amount of hardware that a particular cloud service takes.

Yes, desktops continue to be gluttonous hogs, because you'd rather have your software now than glorious software three years from now (which you can't have anyhow because the company went out of business trying to polish it instead of releasing it). Lately "mobile" is really pushing it, I think. But in Go's wheelhouse, efficiency has actually manifested.

I've often thought that the there is less difference than you'd think between embedded programming and cloud programming; both groups of programmers may literally be counting cycles and watching their L1 caches. It's those in the middle who have more power sitting around than they know what to do with who can afford to be a bit "lazy".


I'm not sure users really care how many gigabytes their word processor is. How fast it is is probably more interesting to them. And while wasting CPU cycles on abstraction layers isn't a great way to make a super fast program, if the program still runs at 60fps and took half as much time to develop, then maybe they're worth it.

Of course, when you end up with some standards-driven monstrosity like a modern web browser, you do seem to have a lot of unnecessary abstraction layers and also it's slow.


> and took half as much time to develop

Abstractions make coding faster now?


No, it'd be way faster to just do it all in assembly. These fancy "high level languages" and "memory management" and "libraries" are just cons foisted on poor unsuspecting programmers by middle management and enthusiastic marketers. Real Programmers (TM) don't need any of that shit.

(/s)


You don't seem to understand what "abstraction" means in computer science. Hint: it's nothing to do with memory management. High level languages like Python/Ruby actually have less abstractions than lower level languages like Java because they don't need them, and Python/Ruby programmers tend to want to get stuff done rather than write a ode to the Gang of 4 in XML.


Isn't that the point? Why else use abstractions if not to make yourself more productive?


Well if I asked your average brain-dead Java developer it would be to make your code more "generic" so you don't have to change a single line of code when requirements change, just tweak some XML somewhere!

And if there is one thing that Java developers are not, it is productive. I will usually be finishing off a project in Python while they are still coding getters/setters on their AbstractProxyFactoryFactory class.


Yes, Java developers still hand code their getters and setters. What era are you from, again? Also, those "brain dead" Java developers still write code that smoke your dog slow Python code regardless of how meticulously you hand crafted your code. So yeah, I would be bitter too.


> What era are you from, again?

Uh, the one following hordes of brain-dead Java developers hand-coding getters and setters?

> Also, those "brain dead" Java developers still write code that smoke your dog slow Python code regardless of how meticulously you hand crafted your code.

And is delivered 2 years later, requires 5 times more people and costs 10 time more to develop. And Python can be plenty fast if you use the right libraries.

> So yeah, I would be bitter too.

What do I have to be bitter about? I get paid well to write Python, C and Go on bespoke and interesting back-end systems and don't have to attend daily stand-ups with brain-dead Java developers and Oracle DBAs and listen to them duke it out over who's fault it is queries are running slow. No thanks.


>Uh, the one following hordes of brain-dead Java developers hand-coding getters and setters?

I can only image what they think of you and it's probably not very good. You must be a nightmare to work with and be around. Which is probably why they keep you away from people.

>And is delivered 2 years later, requires 5 times more people and costs 10 time more to develop. And Python can be plenty fast if you use the right libraries.

Sounds very anecdotal. This seems to be your modus operandi. I could say the same thing about worthless Python developers. No wonder that language is in decline and seldom used in the enterprise.

>What do I have to be bitter about? I get paid well to write Python, C and Go on bespoke and interesting back-end systems and don't have to attend daily stand-ups with brain-dead Java developers and Oracle DBAs and listen to them duke it out over who's fault it is queries are running slow. No thanks.

Again with the idiotic anecdotes. You must have been exposed to a work environment that was not congruent to that of your own, but somehow think it's the blueprint. But, make no mistake, you are a bitter man and if Java contributed to that then I'm thankful for its existence.


I'd say the need to use IDE-generated code implies there are missing language features. That's why Kotlin looks really exciting.


I agree that IDE-generated code is a bad sign (though honestly a lot of use of getters/setters is brain-dead - they make sense for a library but in application code public fields are fine). But Kotlin means paying all the costs of using Scala (which is already production-ready and more widely supported) but getting very few of the benefits.


You're comparing libraries here. There are libraries in Java which are not over-engineered. That being said, even if all Java libraries were over-engineered, it would not make you correct. Correctly crafted abstractions makes you more productive.


It's just far easier to write bloated software than it is to write efficient software. Particularly in the "just get it done" atmosphere most programmers operate under.


I would say it's a trade-off between writetime efficiency and runtime efficiency. The trend has been that runtime efficiency matters less and less. Compare the modern idea of "efficient" software to the binary that early machines were programmed in.

With Moore's law possibly ending and current memory access speeds though, it may be starting to matter a lot moore.


> This is the exact opposite of the experience I've had with (most) software. A new CPU with a higher clock speed makes existing software faster, but most new software written for the new CPU will burn all of the extra CPU cycles on more layers of abstraction or poorly written code until it runs at the same speed that old code ran on the old CPU.

You could also see it in a positive light: the higher processor speed allows more abstraction layers, which makes development easier and faster (if done right, of course).

ORMs and web frameworks make it much, much faster to develop a CRUD web app then the alternative of writing everything yourself. Of course, you pay with performance, but you gain in time to delivery, which in turn translates to more features in the same time.


Why would I want to solve the same problem I was solving last year, but just a little bit faster (or more times, if you prefer)?

If it was good enough last year, it's probably good enough today. Today I will solve new problems with even more resources.


Was that sarcasm ?


I know it's a common complaint amongst a certain set to remember some bizarre version of the good old days when software was lean and mean and usable and did all the work our modern software does, but that time literally never existed. What you call bloat, most people either call "usability and features" or simply don't notice at all. The fact that you (apparently) don't like usability and features and prefer to call them bloat doesn't actually give any veracity to your opinions - certainly not enough to make such assertive unsupported statements.


That's not true. There are many websites today that have identical or less functionality than in the past, and they're just SLOW. So many sites I visit do an inane amount of work to load up a static site. And they scroll poorly, they feel laggy. There's no new functionality, except as far as the developer goes - they're now doing databinding on the client, loading content at runtime (vs sending back rendered HTML), etc.

Edit: I'd also add "on the web" continues to be an excuse for slow, unresponsive software. Even in ~96 or so, I remember folks getting excited. "Look at this online frog dissection thing!" ... It was crappier than what you could do with even a small download. But it was on the web so it was hot. Same thing now.


Websites are a great example since the bloat is clearly visible on the developer's end. I decided to try using Foundation last week. The last time I worked on a website I just generated some static HTML markup and filled it in with my own hand-built CSS. Nobody would give an award for that design, but it was functional enough.

The Foundation install instructions first ask for three dependencies: Ruby, node.js, and git. Subsequently, through gem and npm, dozens of other dependencies were installed. The framework pushes multiple .js dependencies onto the served pages, and heavily encourages using either Compass or libsass to generate the final CSS, making it weigh considerably more. It even had tooling to automatically rebuild as a background process.

All of this, and I basically just modified the example templates slightly to help me build a layout. In theory all sorts of other things could be done with that framework, but the nature of the software stack we have, at least in this domain, favors including the kitchen sink when you need a cup of water.


Don't start on node. A dozen thousand files of dependencies because every function needs its own module containing at least 6 files. And using RequireJS, well it takes 30s+ to build this site even though it's not doing anything earth shattering. And that's before running uglify or any such minimizer. I don't get it.


It could still be a case of one person's bloat being another's feature. Though for a website website, the other people could be advertisers, people who develop the site and people who publish content to it. There was a Mozilla report not too long ago that found that just disabling tracking (not ads) reduced page load times by almost half.


In many cases it's not. In fact I've recently worked on several projects where the frontend is stupidly heavy for zero reason. Just sloppy or over engineered code.

I know the full extent of the capabilities - one is just a corporate website with no interactivity. It's just dumb. The previous version was just simple static HTML; but as part of the "responsive design overhaul" it turned into this behemoth that makes several dozen requests to open the homepage. Nuts.



I am a bit surprised by most of the discussion here so far. Garbage collection has first of all one fundamental advantage: correctness. You are guaranteed never ever to have a pointer to a freed object and that any unreachable object does get freed. For almost all programs that get written, correctness should go over speed.

And speaking of speed, unless you require hard realtime behavior, garbage collection can be quite beneficial. A generational GC offers faster allocation times than any malloc based allocator, and the collection of the nursery generation is instantaneous in most cases. ARC has the overhead of counting for each referencing/dereferencing and while it might predictable about kicking in when killing a reference frees memory, the time required to free a given object completely depends on how much objects get consequently freed.

Furthermore, garbage collection helps to write clean code, as it is safe (and usually cheap) to allocate memory during a function call and return results referencing the memory.

Of course, badly written programs might perform badly with GC - but without the same kind of programs would just be a disaster. And most strategies for efficient memory usage used in non-GC languages (e.g. memory pools for certain objects) can and should be equally used in GC languages.


Erlang's per-process (Erlang process, not Unix process) GC is pretty good from this point of view. I'm surprised they didn't mention it as something to think about.


Why would you mention it? Go is shares all memory, erlang doesn't. It is a different problem.


Actually, IIRC Erlang does have a shared heap, but it is used as an optimization to avoid copying very large objects.


That doesn't contradict what I said. Go shares ALL memory, erlang may share some sometimes.


I've gone through the exercise of enabling something like per-process GC in Go by using multiple processes and the standard Go RPC library. It's easier to get to that point in Erlang, because you're already there by default.

Very good low-latency GC is good an alternative to per-process GC, especially combined with Go's ability to let you write to avoid GC pressure.


Interesting. I think the GC pauses are go's biggest problem.

Looks like they are tackling this head on with positive results.


Have you encountered any problems with go GC pauses?


Yes, I have. We used go in a system that had to keep track of essentially large hashes containing popularity / scoring information, in addition to what were effectively routing tables, and ran into >500ms pauses.

At scale, it seemed the largest part of the complexity of go was manipulating data structures and code to avoid gc pauses. With sufficient work we may have been able to decrease the pauses sufficiently, but we also ran into raw request per second numbers that were lower than we liked.

The direction we have taken was to switch to C++ for this application. Having said that, the gc pauses were the primary reason for the change.


If it's possible, and you're interested in trying, it would be interesting to pull that code out and try it again with Go 1.5, if it's easy.

If you've got a C++ solution, I would not suggest under any circumstances short of Go suddenly and frankly mysteriously blowing the doors off of C++ that you switch... I'm just saying it would be an interesting comparison.


I'll be recommending we try it in a lab with go 1.5, absolutely.


Why didn't you just use Java? Hotspot has been optimized for over a decade to make (among other things) gc pauses as manageable as possible. And there are other runtimes (http://www.azulsystems.com/) specifically optimized for low latency. The poor quality of Go's GC were never a secret.


Ah, too bad. Do you know what sort of heap sizes you got those pauses at?


About half a gig resident memory.


So Go's GC spent 1ms per MB?


According to this graph: https://pbs.twimg.com/media/CJatKFQUkAE5qcR.png:large

It's about 0.3ms per MB, so 1ms isn't that different.


x86 or x86_64?


x86_64


For many applications, like web backends that may have previously been written in Ruby, Python, or JavaScript, the garbage collector was not noticeable at all. Improvements to the garbage collector target apps that previously would have been written in C with manual memory management, as mentioned in the talk. Not all apps need to be written in C obviously, but this will help bring Go's concurrency primitives to places where the GC was previously a deal breaker, such as eggnet's program.


That's exactly the awkward ground Go is in and why I stopped using it. It's not really necessary for any company's web backend I've worked on, the existing in Ruby/Python/JS works fine. Yet it can't compete with the other stuff out there, Rust, C(++), Erlang or soon Swift 2.0.

Though admittedly, some of these have a dismal concurrency story so I got to hand Go credit where it deserves.


Was the presentation more in-depth that this summary? I'd love to read or hear more about the changes.


The summary pretty much covers it in terms of what was presented. There definitely would not have been enough time to discuss the changes they made at length.


And with this, I hope we'll see a race for GC pauses improvements.


"Go programs will get a little bit slower in exchange for ensuring lower GC latencies."

How much slower are Go1.5 programs compared to their Go1.4 version? Is this relevant for web apps?




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: