Actors, Green Threads and CSP on the JVM

noelwelsh · on Sept 3, 2014

The argument seems to be that the actor model implementations on the JVM aren't all that fast and the language can't stop you from shooting yourself in the foot.

That's not really the issue as far as I'm concerned. Message-passing concurrency actually allows people be productive writing concurrent code that has a hope of working -- that is the main advantage. Occasionally things go wrong. Occasionally you need more performance and have to drop down to a lower level. But I can easily write a web service running on the JVM that gets 1000 requests/s per CPU with a bit of message passing concurrency. That's the point.

Heck, I don't even like the actor model -- I much prefer CSP -- but I'd attack it for other reasons (like lack of type safety).

yummyfajitas · on Sept 3, 2014

The lack of type safety is not really a flaw with the actor model. It is a flaw in the Any => Unit implementation of actors currently being heavily promoted by Typesafe, Inc (irony abounds).

A while back I wrote a type-safe variant of actors (I called them Agents). It's done on top of scalaz-streaming which also gives better performance than akka (e.g., x.map(f).map(g) doesn't involve 2 trips back to the threadpool).

http://www.chrisstucchio.com/blog/2014/agents.html

Another neat benefit is separating state changes from effects. E.g., if you have a IncrementCounterAndUpdateRedis agent, you can test it's counter incrementing functionality without ever touching redis.

MCRed · on Sept 3, 2014

Right, and when you "drop down to a lower level" to get more performance, the entire house of cards comes crashing down with bugs that are extremely difficult to debug.

The issue isn't actor model on a virtual machine-- erlang runs on a VM-- the issue is that the actor model is insufficient. You need pretty much most of OTP to build reliable, stable, concurrent systems.

This is why goroutines, everything on the JVM, etc are not going to work.

The sad thing is, I think the primary reason people are not using Erlang is they are afraid of the syntax, but even that reason is obliterated by the existence of Elixir (an even better language on the erlang VM)

gilbertw1 · on Sept 3, 2014

What do you mean by:

"This is why goroutines, everything on the JVM, etc are not going to work."

I've used Akka very successfully on the JVM for non-trivial clustered applications where performance is important, and I'm pretty sure the existence of most of google's infrastructure says that goroutines work at least a little bit.

Do you mean not work well? They won't gain traction?

jeremyjh · on Sept 3, 2014

For example, Go has no means of monitoring or linking to it's goroutines; without supervision trees the Erlang model for resiliency cannot be implemented.

village-idiot · on Sept 4, 2014

Let's break this down a bit. It's pretty clear to me that there are 3 reasons why you'd want to use CSP channels or actors.

1) Reliability. This is mostly going to be of the subscribing to other actors to make decisions when they die type. This is a complete replacement for try/catch, and probably isn't possible or desirable on the JVM due to mutability concerns. If you need and want this, Erlang is your bet.

2) Performance. This is probably the most common reason, and a true fiber system can hit this just fine on the JVM. So what if touching old code might cause degraded performance due to blocks, it's not unreasonable at all to say that high performance code will require some care. And if the fiber system warns you, the more the better. Quasar is your bet here.

3) Architectural niceness. Callbacks suck, and we all know it. CSP channels can be seen as a nicer way to structure the flow of a large asynchronous system. In this context I think core.async tends to be the best, because of it's support for transducers and Javascript support. Although Quasar/Pulsar would not be a bad second choice for this because they work outside of a go macro, assuming you only need them on the backend.

rdtsc · on Sept 3, 2014

> The primary advantages center around the ergonomics of concurrency.

That is how most "frameworks" or those that try to copy this aspect from Erlang see it. But the trick is in Erlang this actor pattern is used for fault-tolerance just as much. That was the goal right alongside concurrency initially. (The third one was priority of low latency responses, I believe).

That fault tolerance is harder to copy and that is why most libraries and frameworks give up and copy the "class object+thread+queue" and call it "We have a fast Erlang now".

The closest to get to something like Erlang fault tolerance wise is to use OS processes and IPC (via ZMQ), with some serialization. But it would be hard to run 2M of those on a reasonable machine.

Plus Erlang is not just the language (which is rather small and simple) but the whole set of helper libraries and tools. Including a distributed database, distributed application controllers. Support for rpc and so on. Supervisor patterns you can use etc.

heavenlyhash · on Sept 3, 2014

I can't help but wonder if the article of the author has seen Quasar: http://docs.paralleluniverse.co/quasar/ It seems to be a concrete refutation of his claims of impossibility. Quasar successfully brings green threads to the jvm. It includes both channels as are now popularized by golang, as well as higher level patterns like actors. Despite the young nature of the framework, benchmarks show it comparing reasonably well with both golang and erlang. Quasar also provides libraries that step all the way up into the OTP realms of supervision trees (though I haven't myself used this, yet).

The article mentions bytecode weaving, but dismisses it with very wavery justifications. Bytecode manipulation tools are a successful part of the jvm ecosystem. Frankly, they're part of why I consider the jvm ecosystem so successful: bytecode manipulation has allowed things like:

- third party tree-shakers/minifiers and obfuscators (i.e. proguard)

- cross compilers (i.e. robovm)

- concurrency libraries that DO have real green threads and continuations (i.e. Kilim, Quasar, and others)

- code coverage and complexity analysis tooling (i.e. jacoco)

- scala

- clojure

- groovy

- kotlin

- [... more languages ...]

There are two critical points about the above:

- All of these tools were built without direct cooperation with the compiler and core tool chain. That means experimentation and growth were possible from the community.

- Everyone's tools play nice with each other! You can use Quasar as a library in Clojure and then feed that bytecode into Proguard for minification, and then add code coverage instrumentation, and then feed it into Robovm!

Given the wild success of bytecode and bytecode manipulators, I have no idea how the article can so whimsically poo-poo the entire field.

(Yes, I'm well aware Erlang has a VM that allows alternate languages as well. And yes, Elixer is pretty. OT, no, I won't be making investments of my time into Elixer, because I like strong compile-time type systems, and Elixer doesn't have that.)

It is true that even in the presence of a full greenthreading tool like Quasar, code can call legacy APIs that still block a full thread, but this is not sufficient cause to dismiss the possibilities. To quote back part of the article, blocking will always be an issue in any cooperatively multitasked environment: "There’s no real way to limit what that code can do, unless it is explicitly disallowed from [...] looping." And yet I wouldn't claim Erlang fails to give me concurrency just because it still allows loops! Part of the compromise of cooperative multitasking is the very premise that in exchange for the higher performance possible from cooperative code, yes, poorly written code can suck up arbitrary amounts of CPU before yielding. If this were a practical concern, it would also be entirely possible for a bytecode instrumenting library to inject cooperative rescheduling points even into loops; and yet I have no real desire to see this feature.

Furthermore, I strongly object to the claim ForkJoin is "notorious for its overhead". All thread synchronization is notorious for its overhead. That's completely known to any programmer with experience in this area, and in no way unique to ForkJoin.

For an excellent, in-depth coverage of what exactly ForkJoin is and the problems it solves for you, see https://www.youtube.com/watch?v=sq0MX3fHkro . I highly recommend watching the entire thing despite its length, and even if you are not a JVM programmer -- even if you've been doing concurrent programmer for years, you will almost certainly walk away knowing significantly more about concurrent scheduling from the (relatively) high levels of memory fencing all the way down to CPU architecture choices and their impacts.

I'm not going to claim there are no issues with something like Quasar. In particular, I find that it is harder to operate in an ecosystem where very few existing libraries understand what your application is trying to do with green threads. Mostly, this doesn't phase me if my application is calling out to other libraries, because I control the scheduling one step above them (just like I would in a plainer actor framework without green threads like Akka). The problem is more with "hollywood" style frameworks -- the "don't call me, I'll call you" type -- so far it feels like these are very hard to use when your application is using green threading, but the calling framework has no clue about it. Some sort of interfacing code is required and usually has thread handoffs of its own, which can be moderately unpleasant, and limits your scalability at that juncture. But this is a present-tense bummer, and can be solved by patching (or outright replacing) these hollywood frameworks, or simply avoiding frameworks of that kind altogether.

But in short, I still think it's a bit unreasonable to dismiss the existence of ponies.

arielweisberg · on Sept 3, 2014

For me the caveat that calling a library or existing code can block the scheduler is a non-starter. I love what everyone is doing and I don't think it means they are useless, but for certain tasks and existing projects it means they are impractical to incorporate.

I think it's fair to say that what we are being offered is not ponies. It's a useful tool, but also a leaky abstraction and not what I want to be working with in the long run.

Also the statements about Erlang and loops is odd. My understanding is loops in Erlang are preemptible and that the VM bends over backwards to provide consistent scheduling even if it means losing some performance.

heavenlyhash · on Sept 3, 2014

It's true that Erlang has some capabilities of preemption, but now we're getting into an altogether more interesting range of details.

Erlang is still essentially cooperative and not preemptive, if I've understood my reading. That means the BEAM VM is doing something very similar to the style of instrumentation Quasar is doing: it injects yields into your code at points it thinks are reasonable. This is not quite the same thing as true preemptive scheduling as OS-native threads do. Quasar could do this kind of safepoint injection as well, though afaik that's not currently a feature.

Your definition of ponies and mine seems to diverge here, and that's fine :) I agree that true preemptibility is an even higher bar we can hold scheduling frameworks to. But it's also a very complicated area to get into, it's not completely without it's tradeoffs (full preemptibility pretty much gets us back to OS native threads, right? and there's very real performance reasons there's so much momentum away from that right now), and I also feel that I can get a lot done with green threads without these features. Maybe we'll see a growing swing towards safepoint injection for psuedopreemptibility -- I'm just making words up at this point, as far as I know; if there's a better existing terminology for these shades of grey I'd love a link -- in the coming years. I don't know where I place my bets on that, yet.

EDIT: this also appears to have been discussed before at https://news.ycombinator.com/item?id=7962838

pron · on Sept 3, 2014

Accidentally blocking the scheduler in Quasar is immediately detected and results in a warning with the exact stack trace of the offending operation. Also, that doesn't really "block the scheduler" but merely one of its threads. ForkJoin is more than capable dealing with occasional kernel threads blocking.

arielweisberg · on Sept 3, 2014

That's my point. I don't want a warning. I want it to just work ala Erlang. Injecting notifications to the threading framework that I might block is not ponies.

Out of curiousity how is Quasar detecting blocking?

To my knowledge ForkJoin doesn't detect blocking?

From the JDK 7 ForkJoin javadoc > However, no such adjustments are guaranteed in the face of blocked IO or other unmanaged synchronization

Sure the framework has extra threads and will work around it via work stealing, but you can get a lot of blocked threads at the worst possible time when you hit a correlated source of blocking.

You also lose thread affinity once work stealing kicks off.

Notifying a framework of potential blocking is certainly less nasty than what I do to work around blocking without one, but for an existing project of sufficient scope it's tough to transition.

pron · on Sept 3, 2014

AFAIK Erlang doesn't even warn you if you call blocking C code. The way Quasar does it is as follows: every time a fiber becomes runnable, it has a counter incremented. Every once in a while (I think 100ms) a special (kernel) thread goes over all FJ's worker threads and takes note of the fiber each is currently running and its counter (this requires some memory fences, but we take advantage of those already found in Quasar, so there's no added overhead). If it encounters the same fiber, with the same count twice, you've got a "runaway fiber", that's either blocking, or spinning too long. You can further examine the thread's state to see if it's blocked or not, to figure out which of the two things is happening.

Just to clarify: it's perfectly OK to call blocking code on Quasar fibers. In fact, it's encouraged. But the blocking call must be "fiber aware", and there's a project called Comsat, that takes many popular Java libraries and makes them fiber-blocking without changing their APIs.

This leads me to another point, which is time-slice based preemption of fibers. That's a feature Quasar had early in it's evolution, but has since been taken out (Quasar is preemptive, but doesn't offer time-slice scheduling). The reason is that time-slice scheduling is great when you have hundreds of threads running, but quite terrible when you have a million, because it means that the threads (lightweight or not) constantly compete for CPU cycles that the CPU just can't keep up with. In Java, plain threads are still available (with the same API as fibers, i.e. new Thread vs. new Fiber etc.), so for long-running computations, you're better off using a kernel thread; work-stealing scheduler aren't great at scheduling such tasks anyway. In Erlang, you don't have access to kernel threads, so time-slice scheduling is necessary to support the occasional heavy-computation process.

pron · on Sept 3, 2014

Author of Quasar here and apparently the target of the criticism in the article. It's kind of hard to make out the main claim the author has, but let me respond to the few more specific claims:

1. ForkJoin is not "notorious for its overhead". In fact, it is among the best implemented, best performing work stealing schedulers out there. Scheduling a task with ForkJoin takes a few nanos, and is almost as cheap as a plain method call. Don't take my word for it: go ahead and benchmark it.

2. Like Go, Quasar doesn't constrain the running code from mutating shared state -- if you're using Quasar from Java, that is. But it's still just as useful as Go, and when used from Clojure, it's even more flexible than Erlang, and actually quite safe.

3. My macbook isn't cruddy.

4. The stuff possible with Quasar, like running a plain Java RESTful service on fibers to gain a 4x increase in server capacity -- without changing the code and without even starting to parallelize the business logic with actor/CSP -- speaks for itself.

5. I'm not spreading FUD on threads -- you can watch my talk at JVMLS (linked in the article) to see my precise point: kernel threads cannot be used to model, one-to-one, domain concurrency, because the concurrency requirements of modern application (and the capabilities of modern hardware) exceed by several orders of magnitude the number of threads supported by the kernels. Fibers keep the (excellent) abstraction provided by threads as the unit of software concurrency, while making the implementation more suitable for modern soft-realtime workloads. When your average programmer can spawn up a (lightweight) thread without thinking about it -- say one for each request, and even many more, concurrency becomes a lot easier.

6. The linked Paul Tyma slide are completely irrelevant. I've got nothing against doing kernel-thread-blocking IO. The problem becomes writing simple, yet scalable code to process incoming requests. Modern hardware can support over a million open TCP sockets, but not nearly as many active kernel threads. Asynchronous libraries give you the scalability but fail on the simplicity requirement; fiber-blocking IO gives you both the performance and the simplicity of blocking code.

7. As to the "strawman benchmark" with "too many threads", the author is welcome to repeat the experiment using a thread pool with as few or as many treads as he'd like -- the result would be the same: switching kernel threads costs about 10-20us, while task-switching fibers costs 0.5us (and can be improved).

> few existing libraries understand what your application is trying to do with green threads

That's exactly the purpose of the Comsat project, which integrates existing third-party libraries with Quasar fibers. You're right, integrating "inverted" frameworks does require more work, but so far Comsat integrates, servlets, JAX-RS services and Dropwizard.

[1]: http://blog.paralleluniverse.co/2014/05/29/cascading-failure...

Blackthorn · on Sept 3, 2014

While you're here, can I just thank you for Quasar/Pulsar? Amazing piece of engineering work, every language deserves a threading library like that.

heavenlyhash · on Sept 3, 2014

> That's exactly the purpose of the Comsat project, which integrates existing third-party libraries with Quasar fibers.

Indeed! :) And Comsat is a hugely important part of Quasar's growing usability in real-world applications. (I'm using the servlet & JAX-RS code right now. So yeah, it's safe to say I'm thrilled about those integrations.)

Quasar also has great abstractions available if one needs to generate new bindings to any code which can currently produce callbacks: FiberAsync [1] is every bit as simple to use as the docs indicate.

But it is still slightly-more-than-none work required when dealing with hollywood frameworks that haven't already been adapted. It's totally manageable; at the same time, it's my personal hope in the long run we see more frameworks growing up that deal with green threading naturally.

[1] http://docs.paralleluniverse.co/quasar/javadoc/co/parallelun...

hrjet · on Sept 3, 2014

A tangential question:

Does any of this benefit a desktop app? I realize that most of the green-thread interest lies in asyc i/o and i/o bound workloads. But can a desktop app with a couple of dozen threads (i/o + cpu mix loads) gain something from Quasar?

heavenlyhash · on Sept 3, 2014

I'd say Yes.

A) Frankly, channels result in prettier, more maintainable code. I've seen enough questionable uses of LinkedBlockingQueue to last me a lifetime. Inability to so much as "close" a BlockingQueue in the face of multiple concurrent consumers is an unbelievable cramp -- it won't bother you until it does, but when it does, it's just a bellyflop-onto-concrete sort of sensation.

B) I'm even more pessimistic than Pron's sibling response about scalability of threads. A minecraft server with a even a few dozen concurrent players is starting to feel the limitations of naively scheduled threads, as an anecdote. Part of this comes down to the choices of concurrent data structures, how interaction with shared data strucutures is batch and the resolution of locks, the devil is in the details etc etc etc, but I'd venture that the abstractions with green threads and channels make good code a heck of a lot easier.

Truly trivial apps with one "compute" thread and one "UI" thread are unlikely to see serious performance gains. Similarly applications that have workloads that are highly parallel (say, somewhere around $num_cpus threads which exchange information only once every few hundred millions of cycles -- spitballing a bit, but for context I think the Doug Lea talk I linked in earlier (https://www.youtube.com/watch?v=sq0MX3fHkro) mentions thread unpark can take up to a million cycles in a worst-case scenario) are unlikely to see serious performance gains. So there are situations where green threading can't help you from a purely performance perspective, yes. But in practice, it's my observation that it's startling how quickly "simple" apps end up doing enough concurrent UI or network operations that naive threading starts getting unpleasant.

pron · on Sept 3, 2014

Quasar shines when there's a lot of inherent concurrency in the problem domain. This concurrency mostly arises on servers where you have many concurrent requests, but it's also common in simulations/games. If your domain has no inherent (large scale) concurrency, then the OS is more than capable (very) efficiently handling dozens or even hundreds of threads.

sgrove · on Sept 3, 2014

Quasar is an amazing piece of engineering work, and I really don't understand how pron has been able to do so much in such a relatively short period of time (including insights into the ideas and implementation).

jeremyjh · on Sept 3, 2014

Erlang loops will not block the OS thread because the BEAM VM implements preemptive multi-tasking at the (green) process level.

felixgallo · on Sept 3, 2014

Not quite. BEAM implements a cooperative multitasking environment using reductions that are checked at function calls. In practice you will rarely write code that runs forever without calling a function or returning, but it's possible, especially if you use NIFs.

This will generally cause scheduler collapse and result in all sorts of weird problems, so in the most recent version of the BEAM, there's 'dirty scheduler' support so that you can work around that problem if you have native code that runs for a long time (> 1 ms).

A good primer on all of this is http://jlouisramblings.blogspot.com/2013/01/how-erlang-does-....

kylequest · on Sept 3, 2014

> I can't help but wonder if the article of the author has seen Quasar

Yes, given that the post links to Quasar code :-) Take a look at what the "bytecode weaving" link points to...

voidfunc · on Sept 3, 2014

He links to quasar in the article...

heavenlyhash · on Sept 3, 2014

And yet I don't understand why the article is FUD'ing about it!

> hopefully without altering its meaning

...What?

I'm using Quasar right now. It works as advertised. Bytecode manipulation tools are not some scary unexplored arena of JVM tooling.

pkinsky · on Sept 3, 2014

I use Akka Scala/Java JVM actor framework daily, and it manages to get by without bytecode weaving. The lack of type safety is irritating, though.

jeremyjh · on Sept 3, 2014

It's very easy to mess this up though - either by blocking the thread or by capturing a reference to the mutable state of the actor in a closure.

jshen · on Sept 3, 2014

how often does it happen in practice?

InfiniteRand · on Sept 3, 2014

Very interesting article and analysis, although it would be nice if it explained what exactly a "Green Thread" is. From the article, I am guessing that a "Green Thread" is related to the lightweight low-level concurrency mechanism that he is referring to as the alternative to normal threads, but it is not exactly clear what that really means.

noelwelsh · on Sept 3, 2014

A green thread is a "lightweight" thread, meaning a thread that is managed by a user level process, not by the OS. This main advantage is you avoid OS thread context switch time, which is comparatively very large. The disadvantages are:

- you have to balance load across true OS threads to take advantage of multiple CPUs. (You often pin an OS thread per CPU.)

- if you make a call to a blocking OS function you have no way to pre-empt your lightweight thread.

HTH.

Update: "you have to balance load" --> I mean the green thread library implementer. Users of a lightweight threading library typically don't concern themselves with this, though they might if performance becomes an issue.

MCRed · on Sept 3, 2014

Neither of those disadvantages exist in the erlang system, as the scheduler spreads processes across OS processes. I can't speak for "Actor Model" libraries, though.

Cr8 · on Sept 3, 2014

Sure they do.

Erlang spreads processes across OS threads, like most other green-threading impls, and its not always great at it. (I don't know what you mean by "across OS processes." Processes on different Erlang VMs can communicate but the scheduler isn't going to move processes between them)

Calling into native code isn't an easy problem in Erlang either. NIF calls will block a scheduler thread, but the scheduler knows nothing about how long a NIF call is expected to take and will happily queue up processes to be run on a thread that is blocked inside a NIF call.

"Regular" erlang I/O is done by queueing up requests to be fulfilled by .. you guessed it, a pool of threads that spend most of their time sleeping in blocking i/o calls.

otterley · on Sept 3, 2014

What's old is new again?

http://en.wikipedia.org/wiki/Green_threads#Green_threads_in_...

noelwelsh · on Sept 3, 2014

Yes, in some ways. Usually you get a new concurrency model in addition to a new threading model, and more control than the original JVM green threads gave you.

playing_colours · on Sept 3, 2014

Can anyone advise please books / blogs / videos to learn the theory behind concurrency, threading, Green Threads, how they are implemented?

mjstahl · on Sept 3, 2014

To see the earlier work by Rob Pike on the ideas that would eventually turn into golang take a look at this paper:

http://www.cs.bell-labs.com/who/rsc/thread/newsquimpl.pdf

A good overview (of CSP) document is written by Russ Cox:

http://swtch.com/~rsc/thread/

Both the above articles are more focused on implementation as opposed to theory.

playing_colours · on Sept 3, 2014

Thanks!

jerven · on Sept 3, 2014

Wonder what the original author thinks of http://erjang.org/ or erlang on the jvm.

jallmann · on Sept 3, 2014

Erlang on the JVM as anything more than a toy is fundamentally misunderstanding Erlang -- which basically reinforces the article's central thesis. The article mentions the need for lightweight concurrency support to be baked into the platform -- the Erlang VM is the epitome of this. Idiomatic Erlang spawns a large number of isolated, concurrent processes, and lets them crash when things go wrong, with supervision trees to recover and restart processing. If a single JVM thread crashes, the whole VM goes. Additionally, you also lose secondary benefits such as per-process heaps/GC, etc. These things are impossible to cleanly graft on to the JVM.

mike_hearn · on Sept 3, 2014

Er, you can easily write a Java thread that just terminates or restarts itself if it crashes ...

mateuszf · on Sept 3, 2014

Anyone using core.async knows how this article relates to it?

mossity · on Sept 3, 2014

The points about the JVM not being able to guarantee that your code won't block the thread apply; it's left up to you to do it. This doesn't come as any surprise to me, nor I would guess to most users of the library, so I'm not sure this is really that damning. Core.async doesn't use bytecode weaving or fork/join, so those criticisms don't specifically apply.

MCRed · on Sept 3, 2014

It's damning because, rather than outsource the issue like the library makes you think you are doing (or like anyone writing erlang code actually is doing) you still have to deal with the hassle and the risk, so you're not really buying much.

village-idiot · on Sept 5, 2014

Except you're using it Clojure, which makes it easier to avoid that kind of stuff and easier to spot when you count on Java APIs directly.

nathan7 · on Sept 3, 2014

I'd say that many of the issues surrounding accidentally using mutable state are fairly moot in Clojure. Blocking still remains an issue, though there's core.async/thread for go blocks that block so that they're executed on a thread of their own.

derengel · on Sept 3, 2014

It applies equally, core.async is a CSP implementation on the JVM, it would be interesting to know how core.async relates to this issue on the browser instead of the JVM.

_random_ · on Sept 3, 2014

So, CLR and Mono would be better?

PS: at least not the V8/Node.