The Go 1.1 scheduler

davidw · on July 9, 2013

Interesting, looks like they've made progress on this:

https://code.google.com/p/go/issues/detail?id=543

Also, from the article:

> Go garbage collector requires that all threads are stopped when running a collection and that memory must be in a consistent state.

People often ask about how Go differs from Erlang. That one's a fairly large difference under the hood. Erlang does GC on a per process (Erlang process, not OS process) basis.

krobertson · on July 9, 2013

I hadn't looked at that issue before, but have definitely been bit by it.

On our main codebase, we experienced some major issues when moving from Go 1.0 to 1.1 from this exact issue. We had a goroutine that was doing some remote calls wrapped with a timeout, and the call was consistently timing out even though the remote service was perfectly fine (it was another service on the same box).

We found the cause was another goroutine running something in an for loop that didn't do anything that would allow a pause in execution for another goroutine. So, the scheduler just obsessed on that goroutine, running none of the others until that one was done, and by then the timeout on the remote call had expired.

We fixed up that case, but also found a few others were we simply added a very small sleep call... for no other reason than allowing the scheduler to evaluate other goroutines. Meh. It made sense when we finally tracked it down, but was one of those things where we had to pause and ask "really?"... and adding a sleep call with comments "yes, I am really calling sleep".

georgemcbay · on July 9, 2013

While it would be nice if the programmer never had to worry about this situation in the first place, when confronted with something like this I use runtime.Gosched() rather than a sleep call to yield. It more directly performs what you're attempting to do and is much more clearly self-documenting for situations where you really don't need to sleep for whatever period of time but do need to yield.

krobertson · on July 9, 2013

I hadn't seen runtime.Gosched() before, will take another look at it. Mentioned it to a coworker and they already knew of it, so maybe it was always switched to call it instead of sleep. :)

tptacek · on July 9, 2013

I had what I think was a related problem with file descriptor retention in timeouted socket code when writing a simple portscanner in Golang; I had to implement simple time-based flow control to give the runtime time to collect file descriptors, or I'd run out of them.

james4k · on July 9, 2013

If you use Go's time.Sleep(), the goroutine will yield to all other goroutines while it is sleeping. Unfortunately, as georgemcbay says, using the system's sleep impedes Go's runtime as it cannot use that thread to run any goroutines in that time.

But yeah, I would say that is one of the quirks of Go at this time. You definitely need to be aware of how the scheduler works if you're using it in production.

pkulak · on July 9, 2013

This doesn't make a whole lot of sense to me. As long as you have GOMAXPROCS set > 1 this shouldn't happen, right? Go doesn't make a secret about only supporting cooperative multitasking above the OS thread level. Are you saying that one spinning thread was locking up all the others?

EDIT: Okay, saw your reply below. Just to reiterate, Erlang has to make a lot of throughput compromises to support pre-emptive multitasking. Just being compiled pretty much takes Go out of that conversation entirely. I'm happy with the tradeoffs for the kinds of things I need to do. And you can always set GOMAXPROCS to a _multiple_ of the number of cores on your machine to get OS-managed pre-empting.

scott_s · on July 9, 2013

Are you calling a system sleep, or a sleep implemented as part of Go's runtime? If it's a system sleep, and you're on Linux, sched_yield (http://man7.org/linux/man-pages/man2/sched_yield.2.html) may be more appropriate.

But, I acknowledge that is only a slightly-better kludge. It's still not ideal.

edit: georgemcbay's runtime.Gosched() appears to be the "right" kludge, but I'm leaving this up just in case others weren't aware of the system call.

TylerE · on July 9, 2013

What is your GO_MAX_PROCS value? Sounds like your're running single core?

krobertson · on July 9, 2013

Yes... the primary case where it was biting us was with our testing suite. It was a set of functional tests that had multiple goroutines going.

It was something that had always passed and never been an issue on 1.0.3, but started failing 85-90% of the time on Go 1.1 with no code changes. :(

Roboprog · on July 9, 2013

So, does that mean it's safe to say that Erlang must copy the memory of values sent as messages between processes? Or, is there some kind of locking of such (aliased) message data???

JulianMorrison · on July 9, 2013

Messages are deep-copied in Erlang. There are other techniques possible such as uniqueness types, which allow "transfer of ownership", Rust uses those but Erlang does not.

lpgauth · on July 9, 2013

That's not exactly true... in Erlang, any binary with size > 64 bytes will not be copied. Instead a pointer to a shared memory area will be passed.

cpeterso · on July 9, 2013

If pointers are sent between Erlang processes, how does the per-process GC work without stopping the world?

JulianMorrison · on July 9, 2013

Big binaries (which yeah, I forgot about) live off to the side and are not GC'd. They're ref counted.

masklinn · on July 9, 2013

> So, does that mean it's safe to say that Erlang must copy the memory of values sent as messages between processes?

Theoretically not since Erlang data structures are immutable, but outside of binaries which seat in a shared global heap the benefits was never considered to outweight the complexity costs (aliasing analysis and others), so messages are copied indeed.

voidlogic · on July 9, 2013

I would think so unless it stores all message bodies in shared memory and protects them with an enforced COW policy to begin with, or better yet does this with virtual memory with some clever COW mmapping.

hosay123 · on July 9, 2013

"Don't read the design doc, it's too complicated. Instead read this cutpaste!"

The single "lock free" idlep looks like it's just moved the futex contention elsewhere. This will almost certainly bounce like crazy on a many-cored system. Would be interested to see benchmarks of the new design before considering it somehow better.

pron · on July 9, 2013

> The single "lock free" idlep looks like it's just moved the futex contention elsewhere.

Possibly. The Go 1.1 scheduler is inspired by Java's fork/join scheduler, which suffered from the same problem in Java 7. In Java 8 it's been improved to no longer have a single wait-list, and external submissions of tasks (i.e. tasks that are not submitted by tasks running in the thread pool, but elsewhere) are multiplexed randomly (IIRC) among the individual thread queues.

Arnor · on July 9, 2013

I concur. While the article made sense and the decisions seemed logical, I'm shy of the level of knowledge necessary to come up with healthy criticism. A nice quantitative benchmark would clear things up an awful lot.

Arnor · on July 9, 2013

Thanks for this. After reading the article I was able to go to the design document and keep my head above water. Go has a lot going for it, but one underrated aspect is excellent propagation of information about the language. The talks by the likes of Rob Pike and the writing here and elsewhere drives the language in popularity and productivity. We learn the language, true, but we also learn very strong computer science. No wonder people become more productive when they switch to Go! By using it and reading about it, they get better at programming! (I'll grant that this is true of learning new languages in general, but maintain it's especially true with Go...)

bfrog · on July 9, 2013

I've said this before. Go's biggest problem will forever be a blocking GC due to the global heap they've decided to go with versus a hybrid global/goroutine local heap style like the beam erlang vm has.

The problem will show up when people try to fire up millions of goroutines and then wonder "why is my latency suddenly spiking in to the seconds! WTF!"

GhotiFish · on July 9, 2013

That definitely surprised me to learn that go's garbage collection required all threads stop. That can't possibly manifest as anything other than a visible and uncontrollable "THUNK" in your application.

I'd love to see a pros/cons comparison between Go's all at once strategy vs Erlang's per process strategy.

pcwalton · on July 9, 2013

The pros and cons are pretty simple.

Go's advantage is that you can share data cheaply while retaining memory safety [1]. The disadvantage is that you have stop-the-world GC and potential for data races, so you must rely on the race detector. Erlang's advantage is that you have no data races and no stop-the-world GC (and Erlang's GC is easier to implement). The disadvantage is that all messages must be copied and parallel algorithms that require data sharing are more difficult to write.

There are hybrid approaches like Singularity, JS with transferable data structures, and Rust (disclaimer: I work on Rust). These systems use some form of static or dynamic access control scheme (for example, uniqueness or immutability) to control data races and perform memory management for shared data structures, while retaining Erlang's thread-local GC.

[1] With one exception, the memory unsafe data race in maps and slices. See: http://research.swtch.com/gorace

pron · on July 9, 2013

Go has gone the Java route (I find both runtimes to be somewhat similar -- well other than the whole JIT thing -- with Java some years ahead in terms of GC and scheduling), and I suppose that Go, too, will get a concurrent GC sooner or later, but even the concurrent GC on the JVM has a few stop-the-world phases.

rdtsc · on July 10, 2013

Erlang's strengths (and they are pretty much unique for a production level language) are:

* Fault tolerance

* Concurrency

From fault tolerance comes isolation. Don't let a part of your program that crashes affect or crash other unrelated parts of your program. Memory heaps are private for each actor (+/- some refcouting for binaries).

Hot code reloading comes from fault tolerance as well. So do immutable data structures and functional aspects.

As for concurrency. Erlang emphasizes "liveliness" and low reduction over throughput. This is quite rare and is very interesting. It means under concurrent load, it still tries to be responsive. So if 100k clients are connected, and on is performing a CPU intensive job, the other ones shouldn't get socket errors or get blocked. This might come with a trade-off of slowing down that one CPU bound function with frequent interrupts.

Here is a good article on how Erlang's scheduler works:

http://jlouisramblings.blogspot.com/2013/01/how-erlang-does-...

Now Erlang is a tool and it there is no free lunch. All these features you saw above don't come for free. Erlang will be slower in numeric and sequential computational tasks (the language shootout type benchmarks, like finding the shortest path, computer determinants and so on). So in some cases it won't be the answer. You'll have to benchmark and decide for yourself.

bithavoc · on July 9, 2013

I really wish D could have an scheduler and goroutines like Go does. I think D has the perfect foundation for gorutines by supporting actors [0] and fibers[1], they just need to be put together by someone clever on the topic.

[0] http://dlang.org/phobos/std_concurrency.html [1] http://dlang.org/phobos/core_thread.html#.Fiber

parennoob · on July 9, 2013

I don't understand much about processes and threads, but I do know a bit about queues. What is the meaning of this?

"Once a context has run a goroutine until a scheduling point, it pops a goroutine off its runqueue, sets stack and instruction pointer and begins running the goroutine."

Do they mean: "begins running the next goroutine"?