Go's work-stealing scheduler

geodel · on July 18, 2017

The original design document for scheduler [1]. The author Dmitry Vyukov is famous for lockfree data structures implementation.

1. https://docs.google.com/document/d/1TTj4T2JO42uD5ID9e89oa0sL...

twoodfin · on July 18, 2017

One missing piece for me: How does the Go scheduler ensure it won't end up fighting the underlying OS scheduler? Are thread-affinity APIs enough?

Does the scheduler have any awareness of NUMA or NUMA-like topologies (I'm thinking here of hyperthreading) that could influence scheduling decisions?

jlouis · on July 18, 2017

An interesting tidbit from the Erlang world: We have CPU affinity controls (erl(1) options +sbt {db,tnnps}) which will spread schedulers (Erlang's name for a P) over a NUMA-like topology and keep track of the relative price of handoff/stealing between CPUs. Another option defines if you want load compaction and load balancing (+scl, +sub). Some workloads are more efficient if you can keep the work mostly on a few cores rather than spread the work over all available cores in the system because communication will be faster that way. If one core can process all work and get back to an idling state quickly, it usually beats spreading the load out due to caching.

For some workloads and hardware configurations, the gain from doing this is quite high because you can avoid schedulers jumping between cores and in the process destroying cache layers and TLBs.

OTOH, when running in a typical cloud environment, you are at the mercy of the underlying hypervisor and thus your load profile is different. In that case, affinity has little to no effect.

valarauca1 · on July 18, 2017

The answer is no.

Go-Lang heavily trashes your cache, and quickly moves things between cores.

---

There are underlying API's on the OS level that work well. But to my understanding Go doesn't leverage them. Also NUMA aware allocations are lacking.

pron · on July 18, 2017

Go's work-stealing scheduler is crude in comparison to Doug Lea's, which is a work of artful mechanical sympathy: https://github.com/netroby/jdk9-dev/blob/master/jdk/src/java...

geodel · on July 18, 2017

Seems like a reasonable take on Java Fork-Join shortcomings [1]. One of them is exceeding complexity which of course is very popular in Java world and the comment shows how Go version is 'crude' so apparently not good enough.

"The F/J framework classes -

1) have many, many levels of inheritance,

2) nested classes on top of nested classes,

3) instance variables used directly by other classes (known internally as “representation-level coupling among classes”), code from the Hackers Delight without comments about what it does,

4) homegrown deques and queues instead of standard Java™ Classes, and so much more."

1. http://www.coopsoft.com/ar/CalamityArticle.html

scott_s · on July 18, 2017

These are arguments against the implementation, not the actual utility of the framework. One can have ugly internal implementation (which sometimes is unavoidable because you're solving hard problems), but clean interfaces. Regardless, Doug Lea anticipated these complaints in the comments to his code:

     * Style notes
     * ===========
     *
     * Memory ordering relies mainly on VarHandles.  This can be
     * awkward and ugly, but also reflects the need to control
     * outcomes across the unusual cases that arise in very racy code
     * with very few invariants. All fields are read into locals
     * before use, and null-checked if they are references.  This is
     * usually done in a "C"-like style of listing declarations at the
     * heads of methods or blocks, and using inline assignments on
     * first encounter.  Nearly all explicit checks lead to
     * bypass/return, not exception throws, because they may
     * legitimately arise due to cancellation/revocation during
     * shutdown.
     *
     * There is a lot of representation-level coupling among classes
     * ForkJoinPool, ForkJoinWorkerThread, and ForkJoinTask.  The
     * fields of WorkQueue maintain data structures managed by
     * ForkJoinPool, so are directly accessed.  There is little point
     * trying to reduce this, since any associated future changes in
     * representations will need to be accompanied by algorithmic
     * changes anyway. Several methods intrinsically sprawl because
     * they must accumulate sets of consistent reads of fields held in
     * local variables.  There are also other coding oddities
     * (including several unnecessary-looking hoisted null checks)
     * that help some methods perform reasonably even when interpreted
     * (not compiled).

I find the "Calamity" article disingenuous. If you read more of the comments, Lea clearly explains the source of many of his data structures and gives an overview of how it all works.

cooper6 · on July 19, 2017

The article was written in 2010 just before the release of JDK1.7. The above comments are for the JDK1.9 release in late 2017. Doug has had some time to improve his comments. Perhaps you should look into the calamity part 2 article to see how well this "framework" with it's internal structure problems performs in JDJ1.8 streams.

ianopolous · on July 18, 2017

One should bear in mind that the author of that article has a strong vested interest against the JVM's ForkJoin framework succeeding, because they are senior at a company whose sole product is a competing implementation of a similar idea [1].

1. http://www.coopsoft.com/Products.html

cooper6 · on July 19, 2017

NO. The TymeacDSE product was released after JDK1.7. I took the proof-of-concept I gave Doug and made a professional product out of it. See the article "Is there a better way."

pron · on July 18, 2017

The remarkable thing about the implementation is the algorithm, not the programming style. Doug Lea's programming style has always been, and always will be, somewhat... idiosyncratic. Much of it is because he must take into account truly low-level details of compiler optimizations (and you can see that the implementation changes from one Java version to the other). He always warns people not to code like him. In any event, the beauty of the code (like in all of Doug Lea's code) is not in the programming style but in the algorithm and the attention to detail.

stcredzero · on July 18, 2017

The remarkable thing about the implementation is the algorithm, not the programming style.

How so? Is there an article where the algorithm is covered?

EDIT: There seem to be some links in the comment referenced.

jlouis · on July 18, 2017

There are times where a simple solution can beat more complex solutions. That said, Doug's work is indeed beautiful.

Another quite beautiful model is the one by Occam Pi, which has a certain simplicity ring to it:

https://www.cs.kent.ac.uk/pubs/2012/3210/content.pdf

nerdponx · on July 18, 2017

As someone who knows nothing about schedulers, care to explain?

bassman9000 · on July 19, 2017

More here: http://gee.cs.oswego.edu/dl/papers/fj.pdf

bassman9000 · on July 19, 2017

Neat

For the main tests, programs were run on a 30−CPU Sun Enterprise 10000 running the Solaris Production 1.2 JVM (an early version of the 1.2.2_05 release) on Solaris 7.

And slides: http://gee.cs.oswego.edu/dl/cpjslides/fj.pdf

samuell · on July 18, 2017

Stuff that Quasar leverages (or will leverage), I gather? :)

Any more details to share about the differences? Something that Go could readily learn from? :)

pron · on July 18, 2017

Not just Quasar, but pretty much any high-level concurrency framework (Akka uses it, too). Doug's algorithm is lock-free (except in some marginal features, that are not supported and not relevant for the Go implementation anyway) and displays exceptional scalability even on machines with a large number of cores. AFAICT, Go could copy Doug's implementation completely.

geodel · on July 18, 2017

Ofcourse. And Oracle would be so pleased with that like they always are with Google copying Oracle code. [1]

1. https://www.theverge.com/2012/4/19/2961128/google-chief-java...

pron · on July 18, 2017

I am not aware of any lawsuits involving Oracle over code used in accordance with its license. The Google lawsuit was over code that Google believed the license did not apply to, and they could therefore use not in accordance with the terms. In any event, the license of Doug Lea's code is creative commons.

RossBencina · on July 20, 2017

It's not even Creative Commons, it's CC-zero, which is public domain or as close as it gets. Strangely, the file that you linked has a GPL overlay:

  /*
   * This file is available under and governed by the GNU General Public
   * License version 2 only, as published by the Free Software Foundation.
   * However, the following notice accompanied the original version of this
   * file:
   *
   * Written by Doug Lea with assistance from members of JCP JSR-166
   * Expert Group and released to the public domain, as explained at
   * http://creativecommons.org/publicdomain/zero/1.0/
   */

https://github.com/netroby/jdk9-dev/blob/master/jdk/src/java...

I wonder where the original CC-zero version is located.

pron · on July 21, 2017

http://g.oswego.edu/dl/concurrency-interest/

teacpde · on July 18, 2017

> Each M should be assigned to a P. Ps may have no Ms if they are blocked or in a system call. At any time, there are at most GOMAXPROCS number of P. At any time, only one M can run per P. More Ms can be created by the scheduler if required.

So if every new Goroutine I spawn does an I/O operation that takes long time to finish, the scheduler will essentially spawn the same number of OS threads because every other one is blocked and no 'spinning' thread is available? If so, creating new OS threads seems like a lot overhead comparing to run a single thread in an event loop.

Matthias247 · on July 19, 2017

That's only happening for IO operations which get really blocked on syscall level, which means no async version is available. For the most important IO operations (sockets) the asynchronous variants are used internally. Which means if a read/write can not be performed immediately, the goroutine will register at the netpoller component (which is a wrapper around epoll/kqueue/iocp for IO status updates) for IO status updates, and then the goroutine will yield and wait until IO becomes possible again. The current thread can execute another goroutine until then. This means for goroutines blocked on network no extra OS threads are required. For other blocking OS operations (maybe file IO) extra OS threads are required.

snnn · on July 18, 2017

Tensorflow has the same thing from Eigen

signa11 · on July 18, 2017

there is also this paper: http://supertech.csail.mit.edu/papers/steal.pdf which also describes something similar.

scott_s · on July 18, 2017

For interested readers, there's a bunch of papers on the Cilk runtime's work-stealing scheduler. The paper signa11 pointed to is one of many. They are classics in the runtime literature. Other two great papers are "Cilk: An Efficient Multithreaded Runtime System" by Blumofe et al., PPoPP 1995, http://supertech.csail.mit.edu/papers/PPoPP95.pdf and "The Implementation of the Cilk-5 Multithreaded Language", Frigo et al., PLDI 1998, http://supertech.csail.mit.edu/papers/cilk5.pdf

omginternets · on July 18, 2017

>if not found, poll network.

Does this mean network IO is only performed when there are no idling goroutines? That can't be right... what am I not understanding, here?

Matthias247 · on July 18, 2017

I understand that network polling is only performed then. For network IO in general things look different: E.g. if you do a write() the runtime will first try to perform it synchronously in a nonblocking fashion. Only when that fails (E_WOULDBLOCK) then the executing goroutine is parked until the network poller executes and gets the information that IO is possible again.

omginternets · on July 18, 2017

So if I had GOMAXPROCS set to 1 and had two goroutines, one doing file IO and one doing

    var i int
    for { i++ }

the former goroutine would never actually perform the IO?

(I can't fight the feeling that I'm misunderstanding this on some level...)

echlebek · on July 18, 2017

Yes. The goroutine that increments i in a loop never has a chance to yield to the scheduler.

    package main
    
    import (
    	"fmt"
    	"time"
    )
    
    func main() {
    	go func() {
    		time.Sleep(100 * time.Millisecond)
    		fmt.Println("hi")
    	}()
    	var i int
    	for {
    		i++
    	}
    }

  $ GOMAXPROCS=1 go run test.go  # never prints anything

Matthias247 · on July 18, 2017

However that should happen only in a loop without function calls inside it. Afaik the newer Go versions will also yield in the function call preambles to the scheduler from time to time to give another goroutine in ready state the chance to run.

echlebek · on July 18, 2017

That's right. The newer releases of Go include more points where goroutines will yield to the scheduler.

I'm unsure about how that plays with inlining, though.

chrisseaton · on July 18, 2017

I don't know if the article is accurate or not, but I think you do have it backwards from what the article says.

I think it says that network IO is only performed where there are no runnable (not idling) goroutines.

loppers92 · on July 18, 2017

The author of this article is part of the Golang team (https://github.com/rakyll). I guess the correctness of this article is ok.

omginternets · on July 18, 2017

>I think it says that network IO is only performed where there are no runnable (not idling) goroutines.

Yes, that's what I meant. Got mixed up :)

Matthias247 · on July 18, 2017

> Ps may have no Ms if they are blocked or in a system call.

Is it really this way or the other way around? I would have guessed threads which are blocked in a system call are represented by an M which is not currently scheduled by the scheduler, which means that M isn't assigned to a P.

samuell · on July 18, 2017

I'm not super sure what was meant here, but I could add one point of info, FWIW: Go is creating extra OS threads for blocking system calls. In fact, sometimes more than one thread per such (for taking care of stdin and stdout, I think).

I have been able to create at least 4999 go-routines with blocking system calls though (Two OS threads were created per go-routine / syscall), so not a problem you will run into quickly.

stcredzero · on July 18, 2017

FWIW: Go is creating extra OS threads for blocking system calls.

One point for house Go!

dullgiulio · on July 18, 2017

I think "they" above refers to Ms, I don't see the contradiction with what you say. Do I miss something?

Matthias247 · on July 18, 2017

Makes sense then. For me "they" referred to Ps. But I'm not a native speaker. I would probably have expected "those" there.

knome · on July 18, 2017

It isn't a matter of your not being a native speaker. The statement is ambiguous.

43224gg252 · on July 18, 2017

[flagged]

sctb · on July 18, 2017

We detached this flagged subthread from https://news.ycombinator.com/item?id=14795445.

egeozcan · on July 18, 2017

You can hate Java all you want (a hate which I find confusing but it doesn't matter), but the beauty of the referenced code is very hard to match - given that you don't weirdly define beauty as the lack of braces.

LunaSea · on July 18, 2017

> the beauty of the referenced code is very hard to match

- 50 line comments containing type annotations in a strictly typed language and HTML

- 80 column limit

- single character variable names

- variable declarations inside loops

yeah ...

Edit: I like that people downvote without having counter arguments.

bcg1 · on July 18, 2017

All of your criticisms are kind of superficial and senseless:

- type annotations + HTML in comments - it is called javadoc

- 80 column limit - lowest common denominator for most people, also historically part of Sun code conventions for Java - http://www.oracle.com/technetwork/java/javase/documentation/...

- single character variable names - who cares, a symbol is a symbol

- variable declarations inside loops - more readable, no chance of referencing stale value outside of scope, and HotSpot will probably recompile anyhow

egeozcan · on July 18, 2017

I love it when the comments are rich as it helps the IDE, I love the great formatting achieved especially considering the column limit and I also love that the variable names are only single letter when they can be (a local context of a very clearly defined function).

I don't love the variable declarations inside loops and hadn't noticed them, but even if I did, I don't have an opinion about them. Maybe they aren't good for performance? If so, well, then you have a point but I can't say. I'm not a Java expert, I'm not even using it very often.

My first point in the comment you replied was "don't dismiss it just because it's a language you don't like, it actually looks beautiful" and you are arguing that (correct me if I'm wrong) it's not beautiful at all. I wouldn't down-vote you, but it's obvious that we differ on taste. Maybe I should have said that "the beauty of the referenced code is very hard to match if you ignore your bias against a language and appreciate the stylistic choices" as you would probably also agree that it is at least very, very coherent.

LunaSea · on July 18, 2017

Beautiful code should be readable and single character variable name salads are not readable.

If I'm a new contributor and just browsing the code I want to be able to stop on a line an understand what it roughly does just by reading that line without having to replay the whole execution stack in my head as required context.

I agree that the other issues are stylistic and are less important.

The algorithm might be beautiful but it's debatable for the implementation.

jsymolon · on July 18, 2017

> variable declarations inside loops

Depends on the standards. A lot of language standards say to declare vars near where you use them. Does make syntactical analysis easier. See scoping and namespace rules.

pron · on July 18, 2017

That's like judging a piece of literature based on typography and paper quality. Not that those don't matter, but they are secondary. The beauty of this code is in the algorithm it implements and in the mechanical sympathy it exhibits -- not its formatting.

Sean1708 · on July 18, 2017

> variable declarations inside loops

What's wrong with variable declarations in loops?

bluejekyll · on July 18, 2017

Nothing. There is not a performance hit in most languages, and it clearly articulates that the variable is only in scope for one iteration of the loop.

jsymolon · on July 18, 2017

What you are focusing on is perfectly idiomatic Java. I'll bet for most Java slingers, quite beautiful.

Other languages with other standards, yep, nasty.

Language wars are tiring, pedantic and really not part of this discussion on schedulers.

LunaSea · on July 18, 2017

> other standards

Or just standards at all.

I'm sorry but when you read lines like this:

> if ((i = (r - j) & m) >= 0 && i < n && (q = ws[i]) != null) {}

with 6 single character long variable names and 1 two character long variable name, there is something to be said about code quality.

And this is a rule that is true across programming languages.

This is a typical example of written once / never modified or single maintainer code and it's a bad practice, especially for open source projects.

buzzybee · on July 18, 2017

This is not a valid argument when the line in question is dealing entirely with abstract constructs(which it is) and all of the variables are function-local, declared on the very same screen, and several of them exist as aliases for more complex constructs. If you haven't encountered a need for this approach, you just haven't had to write really mathy code before. Mathy code derives little value from longer names, but it does benefit from density.

If you really think something is wrong with it, post ideas for longer names that will clarify and not substantially impact density. I would probably conventionally use "n0" and "n1" instead of "m" and "n" to represent "n-1", "n" - but the usage he has is consistent throughout the file, which is more than I can say of my stuff sometimes.

boomzilla · on July 18, 2017

You can read the comments (and the linked papers) first. This is an advanced algorithm that could take days (or weeks) to fully internalize the details. One can't expect to just read the code and build a mental model of the program in one parse, no matter how expressive the variable names are.

stcredzero · on July 18, 2017

Most language wars happen because people have trained themselves to techniques optimized for one programming environment's cost/benefit trade-offs, then they misapply those to a a programming environment with different cost/benefit trade-offs. Then they start pointless flame wars and pat themselves on the back for being "smart."

pron · on July 18, 2017

I can understand people who don't like Java for various personal aesthetic reasons, but I can't understand people who don't like Java but like Go. If you like Go but don't like Java, then either you don't know one or the other well enough, you get distracted by unimportant details and are expressing a "narcissism of small differences", or you're expressing an emotional bias that has nothing to do with technical facts. Seeing that you can't appreciate a low-level scheduling algorithm that would be pretty much identical in any language (this particular Java code is so low-level, that it's almost C) shows that at least the second reason holds.

Having used both Java and Go, the two seem almost indistinguishable to me (well, they differ in many details, but none that matters much, and their more pronounced differences have nothing to do with linguistic features); Java is more or less Go with generics and exceptions.

grey-area · on July 18, 2017

Java is like Go with generics, exceptions, inheritance (which the author of Java thinks was a mistake), and a separate runtime (JVM).

Those four things alone are pretty big differences, and then there are more intangible yet important differences like stdlib and culture, and implicit interfaces. They are definitely in the same family, but not as closely related as you imply - they are not indistinguishable. I'd say the similarities are more in unimportant stuff like syntax, some of the differences are important.

weberc2 · on July 18, 2017

I think one of the biggest differences is that Go has only value types and references thereto (including the builtin data structures, which are basically just fat pointers). Even if Java gets value types, there will be a huge and useless value/class dichotomy as in C#. But there are dozens of other misfeatures in Java, such as protected/final/first-class-constructors/explicit-interfaces/etc. And this is just the language--the tooling is even worse (and no, a huge, fragmented ecosystem of kitchen-sink tools is not "better" than a few, simple, purpose-made tools).

All that said, I have a lot of criticism for Go as well, especially as a language. I just can't see Java being on par with Go, even with generics. Of course Go has more than a decade of hindsight that Java didn't have (or at least Java was in too deep by then to make the requisite changes).

stcredzero · on July 18, 2017

But there are dozens of other misfeatures in...

Basically applies to most mature widely used languages that have evolved by adding features. In many software projects, the bias is on the side of adding too many features and not accounting for the costs, to the point where someone feels a bit of pain and complains. In Go, the bias is in not having enough, to the point where someone feels a bit of pain and complains.

I'd bet that more language designers are going to opt for the 2nd strategy as we go forward in the 21st century.

pron · on July 18, 2017

The dynamically linked runtime and the JIT are indeed important differences (although not all JVMs share HotSpot's design; some JVMs compile Java AOT and statically link the runtime) -- but they are not linguistic differences, and so nothing that the commenter I responded to could see in the code.

In any event, even regardless of how big you think the linguistic differences are, Java and Go are far too similar to like one and so strongly despise the other (although you could certainly prefer one over the other). We're not talking Go vs. Haskell, or Python vs. C here.

_ph_ · on July 18, 2017

Sorry, I have programmed in Java since 1995 and am now using Go for some years. There are huge differences, so I wonder how you could call those languages somewhat close. Some of the differences, which make me favor Go over Java are:

- functions: Java does not have functions, only methods

- closures: Not sure of the latest state, but until very recently, Java had no closures. Closures are a huge and important element of programming.

- single file executables: make your program easy to deploy, without shipping a JVM, setting up a classpath etc.

- value types: have control to which structures you want to refer by pointer and which should be treated as a value

- goroutines+channels: a far better approach to parallel programming, a goroutine is much more light-weight than a Java thread

- no generics the Java style: eventually Go needs generics, but not in the way Java implemented them.

- no frameworks: possibly not a strict language related issue, but Java grew monstrous frameworks, the Go community has been stay away from them so far.

pron · on July 18, 2017

I could list more reasons for which I prefer Java for large, serious projects (I prefer Go for smaller things) and some of the reasons you list are outdated, but that's not my point, as I don't wish to argue which of them is better (neither is, IMO), nor to focus on their differences. My point is that in terms of language and programming style, their similarities so outweigh their differences, that I cannot understand how one's opinion of both can vary by too much. This is doubly clear when you compare Java and Go to, say, Python, Haskell, JavaScript, OCaml, C++, Rust, Clojure, Scala, Eve, Erlang, Pony etc. If your experience has been mostly in Java and Go, you focus more on their difference, but in the context of all other programming languages that truly differ from those two, I find Java and Go -- at least in terms of linguistic features -- nearly indistinguishable.

_ph_ · on July 19, 2017

I have plenty of programming experience beyond Java and Go :). Currently I am mostly a Lisp programmer, that is why I consider the existence of closures and functions such a huge differentiator for Go vs. Java.

pron · on July 19, 2017

But Java has closures (including type inference), and they're at least as idiomatic in Go, not to mention that all Java collections have map/filter/reduce. As to functions, all Java requires is that they be defined as static members of some class, which is no different than defining them in some namespace. Thanks to static imports, they don't have to be qualified when called. So, not only is this not a huge differentiator, this is not a differentiator at all.

jrs95 · on July 18, 2017

Even though the languages may seem quite similar, the cultures of the two communities and how software tends to be written in each language is vastly different in my experience.

nicpottier · on July 18, 2017

Yes, totally agree.

In my opinion, Java is a great language that has was ruined by enterprise adoption. SOAP, JavaBeans etc, set this precedent of crazy over-engineered solutions in Java and it has never really recovered.

Go seems to attract C and Python programmers that prefer simpler interfaces and doing more with less. It is just a different culture, though I would agree from a language perspective they are very similar.

Which is great, choices are good and you can pick the culture you prefer, but you can only say the languages are similar if you ignore this huge chasm in their cultures.

euyyn · on July 18, 2017

Interesting!