The problems with traditional multi-threaded concurrency go beyond just complexity and safety. They also offer relatively poor performance on modern hardware due to the necessarily poor locality of shared structures and context switching, which causes unnecessary data motion down in the silicon. Whether or not "message passing" avoids this is dependent on the implementation.
Ironically, the fastest model today on typical multi-core silicon looks a lot like old school single-process, single-core event-driven models that you used to see when servers actually had a single core and no threads. One process per physical core, locked to the core, that has complete ownership of its resources. Other processes/cores on the same machine are logically treated little different than if they were on another server. As a bonus, it is very easy to distribute software designed this way.
People used to design software this way back before multithreading took off, and in high-performance computing world they still do because it has higher throughput and better scalability than either lock-based concurrency or lock-free structures by a substantial margin. It has been interesting to see it make a comeback as a model for high concurrency server software, albeit with some distributed systems flavor that was not there in the first go around.
Shared-nothing is great when you can do it. But sometimes the cost of copying is too high, and that's what shared memory is for.
Take, for example, a simple texturing fragment shader in GLSL. You're not going to copy the entire texture to every single GPU unit; it might be a 4096x4096 texture you're rendering only a dozen pixels of. Rather, you take advantage of the caching behavior of the memory hierarchy to have each shading unit only cache the part it needs. This is what shared memory can do for you: it enables you to use the hardware to dynamically distribute the data around to maximize locality.
I did not mean to imply that you are giving every process a copy of all the data. The main trick is decomposition of the application, data model, and operations such that every process may have a thousand discrete and disjoint shards of "stuff" it shares with no other process. The large number of shards per process mean that average load across shards will be relatively balanced. The "one shard per server/core" model is popular but poor architecture precisely because it is expensive to keep balanced.
However, in these models you rarely move data between cores because it is expensive, both due to NUMA and cache effects. Instead, you move the operations to the data, just like you would in a big distributed system. This is the part most software engineers are not used to -- you move the operations to the threads that own the data rather moving to the data to the threads (traditional multithreading) that own the operations. Moving operations is almost always much cheaper than moving data, even within a single server, and operations are not updatable shared state.
This turns out to be a very effective architecture for highly concurrent, write heavy software like database engines. It is much faster than, for example, the currently trendy lock-free architectures. Most of the performance benefit is much better locality and fewer stalls or context switches, but it has the added benefit of implementation simplicity since your main execution path is not sharing anything.
Don't you lose all of your performance gains in RPC overhead? How do you avoid latency in the data thread (do you have one thread per lockable object? won't that be more than 1 thread per core?) - these are the reasons lock-free is so popular.
> Don't you lose all of your performance gains in RPC overhead?
If one did, then why would anyone who knew what they were talking about (or even just knew how to write and use a decent performance test) advocate this method? :)
In a database engine, you ultimately need to move some data around, too. After all, you can't move a network connection to five threads at once, and not all aggregations can be decomposed into pieces that return small amounts of information back. Sharing memory often brings a quite substantial speedup.
agree. But let's say each of these processes bound to a core need to log important information. Well, that logging should probably be done on a separate thread. So, it is really just a matter of using multi threading where appropriate and not using it just to use it
Sigh. What's wrong with using lock-free data structures?
Go study java.util.concurrent. It's one of the absolute best libraries ever written by some of the smartest programmers I have ever seen.
The primary question is "Do I really need to wait or do I just need to be consistent?" 90% of the time the answer is that consistent is good enough.
Lock-free data structures are not a panacea. They don't always do as well as locks in the face of contention. However, if you have that much contention, congratulations, you have an actual spot you really need to optimize.
By default, though, lock-free data structures protect you from so much fail it's ridiculous. I don't dread concurrent programming if I have a good lock-free data structure library.
That having been said, if you really have to wait (normally for hardware access), then you MUST do certain things. Your "lock" MUST be as small as possible--if it isn't "lock with timeout", "single small action that always goes to completion even if error occurs", "unlock"--YOU HAVE FAILED. START OVER. Also, note the "timeout" portion of the lock. "Timeout" MUST be handled and IS NOT NECESSARILY AN ERROR.
Now, these don't get all the situations. People who need "transactions" have hard problems. People who have high contention have hard problems.
However, I can count the number of times I genuinely needed to deal with contention or transactions on one hand and still have two fingers left over.
Whereas, I have lost count of the number of times that I cleared out all manner of bugs simply by switching to a lock-free data structure.
> Sigh. What's wrong with using lock-free data structures?
They don't compose.
You can't use them to, for example, implement a bank account system where you need to atomically remove from one account and add to another, unless there is special support for that in the implementation - and there can't be special support for everything built in.
A lock-free hash table most certainly composes, but it may not scale. That's why you use it instead of a lock.
This is exactly the kind of thing that hangs everybody up when programming concurrency.
Most people just want "multiple tasks executing simultaneously that occasionally have to exchange data". However, everybody beats them over the head with "must always have exactly consistent state".
I don't get your argument. You start off with a 'sigh' as if this is a trivially solved problem and anyone else is just being awkward, and then you admit that this only holds when you don't care about consistency, and on top of that, only when you don't really have a complex parallelism problem in the first place!
The actual answer to your question 'what's wrong with using lock-free data structures?' is really simple - what's wrong with them is that they don't provide atomicity, consistency nor isolation under composition.
Have you ever used one? Please go read all the stuff about java.util.concurrent. It holds for almost all lock-free libraries, but it's way better documented than anything else.
They provide consistency and atomicity. Lock-free data structures are always in a consistent state. The view they provide is always "atomic". It moves from one consistent state to another with no inconsistent state ever visible in-between.
They provide isolation under composition quite well. They compose far better than locks which can't even handle a simple inversion of order.
What they generally don't provide are guarantees of progress under high contention. The structure will never be wrong, but it may not be efficient.
The problem is that everybody equates "consistent" and "atomic" with "absolutely must exist in one and only one place at all times". And that's too strong for most uses.
Yes, if you are writing bank software, you want that transaction to ever only exist in exactly one place.
But, 99% of the time, that's too strong a guarantee.
Yeah I've read the literature and used lock-free data structures extensively. I've got a chapter in my PhD thesis about data-flow concurrency structures which were implemented using lock-free constructs, and I work in Oracle Labs on JVM research.
You've suggested a definition of consistency which is something like "could possibly exist in zero or more than one place at the same time". Well that's useless! Anything could be consistent with that definition!
So really I think you agree with me that you cannot compose multiple operations on a lock-free data structure and get any degree of atomicity, consistency or isolation. But you just don't think that matters for most programs, so you weaken the definitions until they could be met by anything.
> Yes, if you are writing bank software, you want that transaction to ever only exist in exactly one place.
Right so this answers your question!
> Sigh. What's wrong with using lock-free data structures?
Because - as you say! - they don't meet all requirements!
Finally - I don't think we even have lock-free implementations of all data structures do we? So it's not even always an option.
I'm terrified of lock-free stuff because it typically depends on nasty things like memory order, cache behavior and other subtle things, mostly nonportable and simply awful to get right on a new platform.
Also, I keep finding bugs in lock free structures. That's annoying.
Except that libraries are written by people, who have assumptions and probably limited budgets, and the sheer inventiveness of hardware engineers at making things like memory ordering different, or even buggy [game consoles are famous for this] . . . well, libraries aren't magic. You have to test them somewhere, and the failures are often subtle, happen sporadically, are hard to reproduce and are very difficult to debug.
I've used them. But in limited areas, where the win is great and they are explicitly called out as portability hazards, with alternate implementations of the areas in case the lock-free stuff goes pear-shaped. And from time to time we look at these things and wonder if the performance gain is (a) real, and (b) worth the headache.
> I'm terrified of lock-free stuff because it typically depends on nasty things like memory order, cache behavior and other subtle things, mostly nonportable and simply awful to get right on a new platform.
Not really. Most lock free stuff relies on a single compare-and-swap or memory barrier instruction. Even the cheap microcontrollers have them these days.
> Also, I keep finding bugs in lock free structures. That's annoying.
I recommend you leave programming now. :) I have yet to find a library that doesn't have bugs.
java.util.concurrent has some great functionality, but I have found that some of the concurrent datastructures cause significantly more garbage generation than their non-concurrent counterparts.
I was writing some highly concurrent code and initially wrote it with the jdk's lock-free containers, but found that GC was about 5-10x worse (depending on the workload) than simply wrapping the non-concurrent equivalent in a lock.
> I was writing some highly concurrent code and initially wrote it with the jdk's lock-free containers, but found that GC was about 5-10x worse (depending on the workload) than simply wrapping the non-concurrent equivalent in a lock.
That seems a bit high, but I can believe it under heavy contention.
But, you measured, and then optimized. Which is what people should be doing but aren't.
lock-free data structures should have identical semantics (and uncontended performance) to lockful data structures with per-operation locks (up to performance), except for the temptation to hold the data structure lock for a long time.
The problem with locks is that people don't want one thread per object, and instead use one lock per object. This has the effect of making all object operations synchronous, which turns some potential race-conditions into deadlocks.
I've written a lot of multi-threaded code, but I don't think I've written ANY multi-threaded code that doesn't involve queues passing objects (hopefully safely) between threads. As in, no locks, no semaphores, no accessing shared state between threads (OK apart from an global flag structure that just set various error conditions encountered and was only interacted with by atomic get/set operations and where order of access was never important). Adding a lock to a program is like a huge red flag - stop everything and really think about what you are doing.
Some ivory tower nut proved in the 70s that semaphore-and-thread was logically equivalent to queue-and-message, i.e. the same results could be obtained by each.
But queue-and-message is superior in very many ways. Done carefully, you have only the queue as a shared structure. If message processors are non-blocking then the entire system is non-blocking. Deadlock can be statically determined by examination of the message vectors.
And last but certainly not least, in the debugger, all important state is in the message or local variables in a message processor. No enormous stacks to dive through, trying to find who did what to whom. Simple single-threaded message processors have straightforward logic. And a message-aware operating system can make the work queues transparent.
Totally agree. I also have written a lot of multithreaded code in the past years. At the beginning I really used locks and acted on various resources from multiple threads. It turned out to get into an unmaintainable mess where you have to think about dozens of invariants before each change. In the meantime I mostly implement things in a way that each data is owned by exactly on thread (or mostly event loops running on a thread or also in a thread pool) and provide thread-safe APIs on top of that by using queues and message (or function object) passing to exchange data and results between the calling thread and the resource-owning thread.
Sounds like they're just using the wrong tools. Eiffel's SCOOP model[1] made things a lot easier back around '95. They've been improving on it since, with a lot of great work recently [2]. Various modifications proved absence of deadlocks, absence of livelocks, or guarantee of progress. I believe a version was ported to Java. A few modern variants have performance along lines of C++ w/ TBB and Go.
What are the odds that it could be ported to a restricted use of C++ language, I wonder?
Note: Ada's concurrency strategy also prevented quite a few types of errors. They're described in this article [3] on ParaSail, a language designed for easy concurrency that goes much further.
People: There are solutions to this shit. If you're building a distributed system or need to deal with loss of data regardless, use actor systems (Erlang or Akka). If you need something that's not quite as probabilistic and are willing to handle deadlocks use CSP (Go or Rust). If you need absolute determinism and you're willing to pay for it in performance use SRP (Esterel or possibly maybe at your own risk Céu).
If you need shitstains in your underwear use locks and semaphores.
Agreed. Now days, you should not even think about threads. If you are, you're doing it wrong. You should only be concerning yourself with immutability or not sharing state. Hardly rocket science. Here's how easy it is in Go: https://github.com/Spatially/go-workgroup just write a function and send it data. Done. Granted, to do that distributed requires some thought for the remoting but still not a concern for the developer.
I get the feeling a lot of developers have only ever written one or two very specific kinds of software and think that their experiences generalise to all kinds.
Threading with locks isn't going to go away any time soon no matter how religiously one states their opposition to it. Take the case of mobile or indeed desktop app programming:
• Memory usage is important
• Real-time responsiveness is important
• Avoiding slow operations on the main thread is important
Some consequences of these constraints is that if you have a simple actor like design with a GUI thread (frontend) and a backend thread (network, other expensive operations) you can easily write crappy software. If the GUI needs a bit of data and needs it fast because the user just navigated to a new screen, sending a message to the backend actor and waiting whilst it finishes off whatever task it's doing isn't going to cut it. You need fine grained access to a subset of the data being managed by the backend, and you need it now, without yielding to some other thread that might not get scheduled quickly. And you need to avoid delays due to duplicating a large object graph then garbage collecting that immutable copy or (worse) running out of RAM and having your app be killed by the OS.
In some kinds of web server I've worked on, you're serving a large mostly-but-not-quite immutable data store. That data set must be held in RAM and you cannot have one copy for each thread because that'd not fit into the servers you use. And the data set must be hot-updateable whilst the server is running. You cannot just code around these requirements, they're fundamental to the product.
You can sometimes accept doubling your memory usage so the serving copy can be immutable whilst a new copy is created and updated. but sometimes that just makes you more expensive than your competitors.
There are lots of types of programming where for performance, cost or other reasons, you simply cannot say "shared state is hard so we will never do it". You just have to bite the bullet and do it.
Actors/messaging doesn't imply a remote-backend architecture. It only implies messaging and mutually-exclusive _writable_ state.
GUIs is a poor example as they are inherently single-threaded frontends so performing simultaneous actions is already often implemented utilizing message-passing to backend threads which report back to a single-threaded event-loop. That architecture can be local, backend "threads" or remote processes, as it doesn't much matter to the GUI.
With regard to in-memory persistence, again, you almost certainly would not copy data per thread. For a situation like you're describing, a Redis-like architecture is all you need with a few atomic primitives. But, again, incredibly easy to implement in Go and is certainly _not_ rocket science.
Of course, someone is going to still be using locks, but it's a diminishing number of developers that need to code at that level since there are better higher-level techniques that serve many purposes and protect against many types of errors.
For distributed systems you are supposed to use Erlang with OTP or maybe Akka if you need more speed/JVM. Local and distributed concurrency are nearly indistinguishable when using actors.
The main problem with multithreaded programming is that most languages are clueless about threads and locks. They were an afterthought in C, and were viewed as an operating system primitive, not a language primitive. The language has no clue what data is locked by which lock. There's no syntax to even talk about that. Of course concurrency in C and C++ gets messed up.
Rust has a rational approach to shared data protection. Shared data is owned by a mutex, and you must borrow that mutex to access the data. You can have N read-only borrowers, or one read-write borrower. The borrow checker checks that at compile time. This gives us an enforceable way to think about who can access what.
> Shared data is owned by a mutex, and you must borrow that mutex to access the data.
This is how sane locking code works in C or C++.
The critical issue is that we have a lot of code already written which doesn't respect this simple rule.
90% of the time, that C/C++ code needs to be re-architected to make the locks either more comprehensive (fix races) or less (performance).
Rewriting in Rust generally achieves that, because it is a re-architecting and refactoring step with concurrency in mind.
And if done neatly, this leaves no room for the next guy to come in and undo parts of that design, because there's violating that takes more work than keeping it - the wrong approach is suddenly the harder one to get to, unlike C/C++.
The fundamental problem with this pattern in C++ is that you can always just keep around a pointer into it and then the "mutex guard" thing falls apart. There's no way to communicate to the language that this pointer can't hang around any longer than the mutex guard (because that's the only thing preventing the pointer from being racy).
You basically can't write safe abstractions in C++ in the face of pointers.
That's true. Which is why object membership is private by default in C++. You can't get the address of an object member unless the author expressly allows you. That's why I think the GP's complaints about pointers making safe shared data nigh-impossible in C++ was maybe overstated. The language/compiler has things to help you. Much more than C, anyway.
Yes, for big things it's a drag. Even for small things it's verbose to use and awful boilerplate to write. And I totally agree with the spirit of your argument that pointers and pointer aliasing makes concurrency harder in C/C++. Was just pointing out the standard well-known idiom for avoiding the problem. Indeed C++'s 'private' language feature allows the compiler to help you with it.
I got quite frustrated when I read this article. That's because this article, and the many others like this, confuse the real issue.
This article, and those like it, all state that the problem with multi-threading and synchronization is inherent to the programing paradigm/language/architecture you're using:
> "Buggy multi-threaded code creates race conditions, which are the most dangerous and time-consuming class of bugs in software"
> "because the traditional synchronization primitives are inadequate for large-scale systems."
Ok. Fair enough, now tell us why that is so.
I get quite annoyed when the author then proceeds to turn it all around by saying this:
> "Locks don’t lend themselves to these sorts of elegant principles. The programmer needs to scope the lock just right so as to protect the data from races, while simultaneously avoiding (a) the deadlocks that arise from overlapping locks and (b) the erasure of parallelism that arise from megalocks. The resulting invariants end up being documented in comments:"
> "And so on. When that code is undergoing frequent changes by multiple people, the chances of it being correct and the comments being up to date are slim."
Implying that the real problem with locks/threading/synchronization is actually communication, proper documentation discipline, programmer skill (soft and hard).
Of-course I'm not saying that the process of using primitive synchronization methods can't be abstracted over to make it easier to write _proper_ multi threaded code. It's just that this really feels like subjective politicking very much like the aversion to (proper use of) goto() in C/C++ code.
This made me think. I can't actually agree that it's a matter of language primitives. It's very much about the logical complexity of sharing anything. It seems to be a fundamental, recurring theme in CS, data and logic. In a single thread, in imperative language, there is a sequential list of logical steps and manipulation of data, or state. With two or more threads, you have multiple sequence of statements, but they can try to manipulate same data, it's effectively mashing two sequences together in random order! If that is what you want to do, I don't see anything you can really do to make it not rocket science. I'm sure it can be made easier by using some safer primitives for data access, but I don't see how the high level logical races can be made eliminated. As in, a language can eliminate an object being deleted when it's going to be still accessed, but it can't eliminate a student getting an F at midnight while his homework program is still being tested.
> In this approach, threads own their data, and communicate with message-passing. This is easier said than done, because the language constructs, primitives, and design patterns for building system software this way are still in their infancy
"Still in their infancy"? That's basically a description of Erlang's concurrency model, almost three decades old now.
Is there a concurrency equivalent of Spencer's law -- something along the lines of "Those who do not understand Erlang are doomed to reinvent it"?
Solid, maintained and nice is already there. As for speed, one must define their needs. Throughput, latency, asynchronicity and GC strategy (which is per-process and concurrent in Erlang) can all affect perceptions of "speed". Then you're given a multitude of ways to interface with C code, right down to treating a C program like it's an Erlang VM node.
The Erlang team is working on a JIT for the language, and the first release should be out soon. That should help with the "faster" part. "Nicer" is rather subjective. I happen to enjoy how Erlang looks and works, though it took me a couple of tries before I really liked it. I'm not sure why people don't jump on the idea of a bulletproof VM.
Doesn't seem as concise as an ML is one of my factors in "nice".
I guess I don't care about a bulletproof VM because I've not had issues with the JVM, CLR or other systems. And BEAM reeks of instability.
Furthermore the resulting apps seem to bear no relation to Erlang's robustness. Look at RabbitMQ. Lots of stability problems... So what's Erlang saving us from?
Doesn't seem as concise as an ML is one of my factors in "nice".
Sorry, but that disqualifies most programming languages out there. It's not as concise as ML, but it's far more so than your average ALGOL dialect.
(Also, ML isn't as concise as Scheme.)
And BEAM reeks of instability.
[citation needed]
Furthermore the resulting apps seem to bear no relation to Erlang's robustness. Look at RabbitMQ. Lots of stability problems... So what's Erlang saving us from?
A sample size of 1, particularly one that's already known to be an infamous exception (RabbitMQ), isn't helping your case.
Apologies. I was thinking if HIPE, not BEAM. You're right a lot of languages are disqualified. A lot just suck and have strange hacky stuff thrown around helter skelter. Erlang isn't that bad, but it is just questionably verbose and not fast.
And apologies, apart from playing a bit with some custom ejabberd code (which was terrible, but it was written by an inexperienced coder so nothing expected), RabbitMQ is the only high profile Erlang thing I've got experience with. I hear couch or something is also written in Erlang, but I've not heard of amazing HA, more than, say, Redis.
It's just nothing that Erlang has seems that special, apart from the process controller/network stuff. Which is straightforward to write in another language. The forced serialization, eh, I'd like the option to share memory when needed. Hot code patching? From what I understand, it has very strict compatibility requirements, and under those constraints, you can do it in another language. So apart from OTP, I really don't get the point. Erlang's big thing seems to be something that should be a library, not a language.
Other than the fact that it makes your BEAM files "non-portable" [read: can only be run on the arch that they were compiled on], what's unstable about HiPE in Erlang 16.x or later?
> Hot code patching? From what I understand, it has very strict compatibility requirements...
Eh? What requirements might those be?
I've used hot patching, but not in anger. The only requirement that I see is that callers must called the replaced code by its fully qualified (module:fun()) name in order to get the new code. What am I missing?
Riak is another piece of well-regarded code written in Erlang. (If you read Apyhr's writeup about Riak, note that it's from 2013!)
> It's just nothing that Erlang has seems that special, apart from the process controller/network stuff. Which is straightforward to write in another language.
So, like, where are the libs that do all of this in C++, Python, and/or Java? I'm not snarking here. If there exist bulletproof libs to do 99% of what you get with Erlang, then I really need to know about them.
So HiPE is stable now? I haven't used any Erlang for a few years, so I'm glad to hear it's fixed.
I understood that Erlang hot patching was only at a function level, or some other limitation that required much care? No guarantees when the old code will be unloaded? It's been years and I only skimmed things so perhaps I'm wrong and should shut up. But anyways, nothing stops you from doing the same in other languages. Indeed, asp.net does this. But it's just a light perf hack - in general I'm unsure of the usefulness vs more robust approaches. Where do you find it beneficial? (Only time I desire such stuff is to hand off control of a socket to a new server version, and there's nonintrusive ways of doing that.)
I remember seeing someone porting OTP to Java or .Net. Again, what are the language level features that prevents this? Or is it more of a lack of a commercial entity to back the beginning of such a project? Personally, having to serialize everything and getting "transparent" scalability by moving stuff over the network doesn't appeal to me. I'm guessing the rest of the world just runs a cluster of HTTP servers or other middleware and leaves it at that.
Maybe I've just had a terrible exposure to Erlang. Users of it seem to enjoy it, but so do many users. The main point seems to be that Ericsson made a good switch with it... Which seems as fallacious as pointing out that Facebook used PHP.
I mean, it looks like HiPE has been shipped with OTP since 2001 (three years before work started on Dialyzer), and it certainly is enabled by default on all supported platforms (of which x86 and amd64 appear to be two). Now, you do have to pass the "native" option to the code compiler to make it compile to native code, but you don't need to jump through any more hoops than that.
For the software I've written, bytecode-compiled Erlang was fast enough for me, so I haven't much experience with HiPE. I'm pretty sure that none of the documentation on erlang.org indicates that HiPE is experimental or unstable. What was wrong with HiPE when you used it, and when did you use it?
> I understood that Erlang hot patching was only at a function level... No guarantees when the old code will be unloaded?
Yeah, it's hot patching at a whole function level. This level of granularity almost doesn't require any care at all, actually... far less than if you could patch at a statement level [0]. Read the first several paragraphs of [1] to get a high-level overview of how hot code swapping (and code unloading) works. (The prose from "There are ways to bind yourself" onwards is not relevant to your interests, so you can stop there.)
> Where do you find [hot code loading] beneficial?
Whenever I have a service whose code I need to upgrade, and won't need any complicated data migration as a result of the upgrade. Hot code loading is not absolutely critical, but it's another useful tool in Erlang's high-availability toolbox.
> I remember seeing someone porting OTP to Java or .Net.
That's cool! :D What parts of OTP did they not port? Mnesia? The Erlang stdlib? (There's lots more to OTP than gen_server and friends.) Did they also port single-assignment variables, transparent-to-application-code IPC, and distributed code loading-and-execution, or was this just an OTP-the-library port and not a "Let's port some of the nicer Erlang/OTP runtime features, as well as gen_server and friends." project?
> Personally, having to serialize everything and getting "transparent" scalability by moving stuff over the network doesn't appeal to me.
There's no need for scare quotes. Erlang process distribution is transparent to program code. And, like, anyone writing distributed software has to be aware that accessing off-node data is almost always more expensive than accessing on-node data. It's a law of physics. You can't ignore it.
Anyway. As an application writer, Erlang's process distribution is also incredibly nice (until measurements demonstrate that it's too slow for your application, and you have to do a bit of redesign).
For most web app backend services, and a lot of web infrastructure Erlang is more than fast enough, and gives you the tools to trivially scale to meet increasing demand.
> The main point seems to be that Ericsson made a good switch with it...
The point of that example is that over 1.5 million lines of Erlang were used in a piece of telecom hardware that provided 99.9999999% uptime. (That means that the switch was down for no more than 31 milliseconds per year.)
Erlang isn't good for every project. Only people who don't know what they're talking about make that claim. Erlang and OTP do provide you with the tools to relatively easily make fault-tolerant, scalable software. Is it the only toolset that does this? Fuck no. But it is a pretty-well-thought-out, battle-tested, actively maintained one.
What tools do you use when you must write highly-fault-tolerant, scalable software?
[0] If we assume a moderately complex function, there are certainly people alive who could keep all of the interactions between the first half of the currently-running-code and the second half of the to-be-switched-to-code in their head. I'm not one of them.
The transparent part I put in quotes because going over a network isn't transparent. Even .net remoting can transparently create objects over the network. It's just s bad idea and better to be explicit. Like you say, the performance issues are simply a fact.
I've written some telecom stuff and ran the first VoIP oriented 911 service provider. We missed a single call in a year, and we followed up manually on that one (it was during a hard, scheduled, failover, and we were monitoring for call attempts). It's mostly a matter of testing and just assuming everything will fail. From having higher and higher level exception handlers, to assuming every connection, server, process -- anything-- will fail and making sure there's a failover path available. After that, there's monitoring, to try to prevent system wide cascading failures.
Looking back over VoIP stuff I did more recently, the availability rate is way, way lower. A huge chunk of the problems were just lack of testing or procedure. It's embarrassing, really. After that, lack of limits in order to prevent resource exhaustion was the second biggest problem. Failure of the runtimes/VMs was never an issue, across Windows and Linux, CLR and Mono.
How is Erlang going to help with logic errors more than any managed language that discourages state? I don't see how crash and retry fixes the majority of bugs. I can see how it's better than an unmanaged/unverified language where a single fault trashes the entire process, sure. But against JVM/CLR languages, say?
As far as a million lines with high uptime: the Linux kernel is pretty big and haven't people achieved high uptime with it? But I wouldn't consider that in favor of C, just that it shows it's possible in C. Is this an invalid comparison? I'd be more interested in Ericsson's engineering dept, but I imagine it's gonna be what we expect right? Heavy testing and specs?
And suppose I say OK, and move to Erlang. How is it going to maintain HA while pushing, say, a million packets a second of RTP traffic? Right now I just lb stuff out to various processing servers and call it a day. Even the guys I know that use Erlang, all the heavy lifting is C. Erlang's just the signal plane (even then, I wonder how they'd scale to handling DDoS levels of signaling).
As far as I remember, the process distribution part, it's just shuttling around serialized function calls over TCP, right? Not to demean it, just it's not a secret magic perf sauce, is it?
I'll give it another look, I've most likely missed something.
I notice that you haven't mentioned when you last used HiPE, nor have you mentioned what was unstable about it when you used it. I also notice that you haven't offered any sort of detailed description of or link to the OTP port that you saw some time ago. I'm genuinely interested in all of these things.
Every single time I have heard of "network transparency", whether it was from the Project Athena documentation out of MIT, OpenGL tomes, or CORBA programmer's guides, the author has said something to the effect of:
"Network transparency means that -to client code- access of local resources appears to be identical to access of remote resources; client code doesn't have to care where a resource is. However, access of non-local resources is bound to be slower (often substantially slower) than access of local resources. Be aware when writing performance critical code!"
Everything I've read defines network transparency in this way. Everyone I've talked to knows this definition and knows about its performance implications. Everyone I know who's not a freshly-minted web developer agrees that if you lack a certain level of experience, you have no business designing distributed systems. I'm not sure why you have such trouble with the term.
In regards to your VoIP service: Erlang/OTP provides battle-hardened monitoring, failover, exception handling, and the like. When you use Erlang, you don't have to write any of that, or fish for libraries of unknown quality to provide that functionality. That's one of the big things that's nice about the language and platform.
> Failure of the runtimes/VMs was never an issue...
Neither I, nor most folks who work with languages that run on a properly-developed VM often run into VM failures. If we did, we probably would stop using that particular faulty system. :)
> How is Erlang going to help with logic errors...
It helps by letting you write code only for the happy path. See my next comment.
> I don't see how crash and retry fixes the majority of bugs.
You might be confused about what it means to crash in Erlang. When you write Erlang, you code only for the cases that you must handle. This reduces the amount of code you must write and test. If a component of your software encounters unexpected or invalid input, or gets put into an unanticipated state, it crashes, the invalid state is lost, and the supervisor for that part of the system restarts the component. Folks sometimes talk about writing a crash-free error kernel [0] surrounded by code that dies when it runs into something unexpected.
Do you get this for free? Yes and no. The process supervision code is built in. The modular software design to take advantage of the supervisory stuff you must do for yourself. You would likely end up doing very similar design work regardless of what language or platform you used.
> ...the Linux kernel is pretty big and haven't people achieved high uptime with it? ... Is this an invalid comparison?
It probably is an invalid comparison. And yeah, Ericsson's engineering department is likely full of good, disciplined programmers. However, most tools (like Erlang/OTP) that improve programmer productivity will improve the productivity of all but the very, very weakest of programmers.
> Even the guys I know that use Erlang, all the heavy lifting is C. Erlang's just the signal plane.
Yeah. As I understand it, that's the general pattern for high-performance systems. If you have really serious performance needs for your robust distributed (or simply fault-tolerant) system, do some perf tests, write the performance-critical parts in C or C++ or whatever, then use an Erlang Port Driver (or maybe some IPC mechanism of some sort) to connect them to the more difficult or logically tricky bits that are written in Erlang.
As I understand it, this is how the software running the AXD301 was designed. If you have an hour or so, you might be interested in [1]. It's an Ericsson presentation on the design of the switch in question. Some of it is stuff you undoubtedly already know about, but much of it is probably not. If you're looking for more, any one of the papers from Joe Armstrong on the proper way to go about designing Erlang systems are always good reads.
> As far as I remember, the process distribution part ... [isn't] a secret magic perf sauce, is it?
It was written by programmers much like you and me so -no- it's not secret or magical. The protocol is even partially documented [2]. What's more, AFAICT noone claims that it's the source of serious performance gains. It is -however- reliable, and -I gather- well understood, and backed by an active, talented development team.
Your sense of nice conflicts with Erlang's goal of supporting reliable systems. For example, Erlang is not designed using the paradigm du jour. It may seem like Erlang borrows from functional programming, but Erlang's designers invented all that independently because it supported writing reliable systems. Concise and maintainable are antagonistic qualities, and so Erlang's designers sacrificed all the conciseness they could get away with.
Erlang is not deisgned to be nice, elegant or have any other property people normally advertise their language as having. Erlang is designed so that programmers at Ericsson could write better code for phone switches, with constant feedback from said programmers. That's Erlang's mission statement, originally.
If you want concise actors, Scala with Akka allows you to write pretty unreadable code if you want to.
> Your sense of nice conflicts with Erlang's goal of supporting reliable systems.
Obvious counterexample: Elixir. Plenty of modern niceties that Erlang lacks (rubyesque syntactic sugar, scheme-style hygenic macros, the pipe operator, mix, etc), but shares most of Erlang's semantics and ultimately compiles down to the same bytecode, so AFAICS doesn't sacrifice any of the things that make Erlang good for building reliable systems.
I'm not sure if you're implying that Elixir's syntax makes it less good at developing for reliability than Erlang. If so, I'd appreciate if you could explain your thinking further -- if the existence of brainfuck proves that Erlang's syntax is better than Elixir's at developing for reliability, I'm afraid the nature of that proof is eluding me.
> Syntax is crucial when developing for reliability. Brainfuck trivially proves this. Both semantics and syntax need to support reliability.
This is an almost content-free comment. You could have made your comment substantially better by giving some examples of language syntax that makes it easier to write reliable code than unreliable code.
I think that we all agree that Brainfuck and Whitespace are two examples of languages that are very difficult to read. A comment along the lines of "Languages that are difficult to read do not support reliability." is also almost content-free and probably not entirely true. (Some people find C terribly difficult to read -for some reason or other- but very reliable systems have been written in it.[0]) :)
[0] Yes, a quint-squillion unreliable and even dangerous systems have also been written in C. This doesn't invalidate my point.
We're still pretty confident that Rust will be able to support a high quality green threading library and other M:N concurrency libraries without needing to be integrated into the language itself. All the type magic that we do to pass data between threads should work quite well for a green threading library. We already have a couple projects experimenting with this [1], [2], [3]. Servo also has a pretty nifty work-stealing queue [4] that is yet another abstraction for easily sharing data between threads.
It's much easier to use threads than to use them properly. Arguably, the stricture's of Rust's type system makes it harder to use threads. But it makes it almost impossible to use threads improperly. Both are probably good things.
I have seen some real doosies writing multithreaded code. We had a relatively simple data analysis project that took in spectrum measurements from a piece of hardware, logged them, did some basic visualizations, and allowed for controlling the hardware. Each of these functions ran in one or more threads. Imagine my surprise when I saw lots of uses of CreateThread but nary a call to WaitForSingleObject or even EnterCriticalSection. I think there may have been a Boolean flag to "coordinate" a pair of producer/consumer threads.
The visualization end should have been in mostly only one direction and pretty much non-blocking ( except for when you're out of data ).
The other side - configuring the hardware - is my bread and butter, and there are a few simple abstractions that are pretty old now ( I have been using them since the late '80s ) that will help greatly.
Sadly, for inexplicable economic reason(s), these are less well known year by year.
Also - for extra points, think about how you'd do that in one thread. Betcha can... although multiprocessor can be pretty cool. Now write it to where it doesn't matter if it's one or multiple threads...
My first inclination, before putting locks everwhere, was to rewrite it as an event-driven application using WaitForMultipleObjects. I can't remember now what that didn't work.
It seems like the key to writing current code is to abandon the idea of understanding how to construct a concurrent architecture and to figure out how to adopt a pattern concurrent which provides certain guarantees. Often it is something baked into the language, but frequently it is a library. This is one of the reasons I'm so thrilled with Clojure.
1) Because it has STM baked in and there is a core library for CSP.
2) Because it is a lisp so adding foreign syntax is as simple as a library and doesn't need to be a language extension.
The article's style got me into a ranting mood. I don't want "allusion links" that surface vapid text like "new superpowers" or "never ending stream of goodness". You are forcing me to click on them to know WTF you mean.
You get one guy, who is seemingly very smart, and he says basically "Don't do multithreading, it's very hard. Only an elite few, such as me, can do this right, so most of you out there DON'T DO IT!"
It's bullshit. Mainly because it's no harder than anything else, and has just as much pitfalls as every other type of programming. Yes, to a certain degree multithreading is hard, but it's not rocket science. But PROGRAMMING is hard. Not just multithreaded programming. There's nothing very special about multithreaded programming that should scare off people from trying it. Sure, you might fuck up, but that's
For example, our entire company was almost completely brought down a few months ago by our "architect" implementing a feature so poorly that it caused massive system instability. What was this feature? It essentially boiled down to a 1 or a 2. Customer accounts were tagged with either a 1 or a 2, an it's supposed to take a different code path for each, but he made it so fucking complicated and he didn't do his due diligence, the entire weight of his code cause significant downtime, and a customer that accounts for 60% of our revenues almost walked. And none of this is rocket science.
Of course, I worked at another company where one engineer thought "oh, asynchronous APIs are faster than synchronous APIs" so they implemented the entire API asynchronously. Of course, that required mutexes on the server side. And then more mutexes. And it got to the point where the performance was hell because of the unintended consequences of trying to make things faster. You would write a new API and the server would barf saying "You took the locks in the wrong order" but there was no indication of you ever doing anything wrong. It was a mess. So I get what the OP is saying, but it's not specific to just multithreadedness. I bet the same programmer would have made a mess of a single-threaded app as well. They are just shitty or careless programmers.
If you're careful, multithreaded programming is helpful and you can see some significant performance boosts from it. But like every other paradigm in programming, don't overuse it. A judicious use of simple multithreaded programming might help a lot, but there are few apps that benefit from an extremely complex system with hundreds of threads, massive amounts of mutexes, etc.
The reason I bristle about blog posts like this is that there are two wholly disparate types of multi-threaded programming:
First, there's the type that's hard and should probably be avoided except by supergeniuses. This involves big hairy lock graphs where locks are held across complex operations that may involve other locks, and swirling dependencies of doom. This shit is nasty, and I completely agree that you must be "this tall" to be trusted with it.
The other kind of multi-threaded programming is the simple kind, where you need to have a threadsafe interface to a module, the locking is dead simple if you know what you're doing, and your lock graph has about three states, all of which clear in constant time and have no dependencies. In this case, there is no excuse for not having this basic competency, and the attitude that we should all just throw up our hands and never hope to write multi-threaded code again is massively counterproductive. This shit is not brain-bendingly hard, it just takes a small amount of practice and discipline.
Let's stop pretending all multi-threaded programming is wizardry.
The sign is near the ceiling. It's not a question of some people being taller than others. Nobody is that tall - not even dbaron (pictured in the photo), who is one of Mozilla's three Distinguished Engineers.
Carefully balancing swirling dependencies of doom doesn't make you a great programmer, at least not in the world of large-scale systems. Choosing the right design to avoid or eliminate those swirling dependencies is much more important.
That's fine, and I even agree -- perhaps nobody should venture into the swirling dependencies of doom territory with multi-threaded programming. However, my point is that this type of argument makes it seem as though nobody should attempt the more mundane, turn-the-crank, not-very-hard multi-threaded programming either, which is a bad attitude to perpetuate.
Multithreaded programming is not inherently more complicated than everything else, but finding bugs caused by multithreading is inherently more difficult than finding any other kind of bug.
It all comes down to this: You can't reliably find concurrency bugs via testing. You have to prove they don't exist or you will never know. "Don't do it!" is one way of proving that multithreading bugs don't exist.
But I find it annoying when people act like it is always down to choice. Sometimes we need shared mutable state for logical reasons. We can only avoid it where it is an implementation detail.
"The resulting invariants end up being documented in comments."
There's your problem. If you are going to use locks, you need a wider view of the system than you get at the source-code level. It is doable, but there is a big impedance mismatch between this approach to software development and agile methods.
It's really not that hard to write multi-threaded code. I just laugh when I read articles like this - I've been doing it for more than fifteen years now. By taking a tool like that away from your team you're stunting their growth and your product.
He makes it quite clear that they (i.e. mozilla) have tried to multithread things many times and the resulting complexity has led to bit rot and increased bugginess. In that context your boasting seems inflated and inane.
To what should I give more credence - an article in which someone says something is impossible, or large, multi-threaded applications at my workplace grinding through data around the clock?
I feel like a blacksmith who's been told horseshoes are impossible to make.
Thread-based concurrency has no future. It's complex and it doesn't scale beyond a certain point. Process-based concurrency is relatively simple (especially if your programming language has good async support) and it can scale indefinitely.
The one advantage of threads is that the overhead is lower when operating at low concurrency.
But it's like algorithmic complexity, people only care about growth in complexity not about the initial offset.
> Thread-based concurrency has no future. It's complex and it doesn't scale beyond a certain point.
Beyond what point? Every single company that handles petabytes of data, starting with Google, seems to be scaling just fine with thread-based concurrency.
And why shouldn't they? Thread-based concurrency has decades of study behind it, it's very well understood, the tooling is terrific (IDE's even tell you ahead of time when deadlocks can happen and when they do, they can tell you exactly why) and the performances are unmatched.
I'd say thread-based concurrency is going to be around for a while, as opposed to the many fads that come and go trying to replace it, starting with actor-based concurrency and transactional memory.
We haven't yet seen the limits of thread-based concurrency because CPUs have only just started to scale-out (by adding more cores). You're not going to experience any issues if you just have a small number of threads spread out over a small number of cores.
If you had like 100+ cores (just guessing) and several of them tried to access or write to a specific shared memory location, they would spend a lot of time busy-waiting for each other to finish (assuming you're using mutexes).
Maybe using semaphores could work but your code will end up looking like a mess.
With process-based concurrency, each process can only rely on its own memory pool, so that does use-up more memory, but processes are fully independent from each other (fully parallel) so no time is wasted busy-waiting for anything.
Nothing like replying hyperbole with more hyperbole. Calling the actor model, which has been around for forty years and has been slowly gaining recognition, a "fad" is frankly ridiculous.
"Process-based concurrency is relatively simple (especially if your programming language has good async support)"
You can have "process-based concurrency... simple" or "good async support" but you can't have both... good async support becomes not simple. "Good async support" in a single process just means you pretty much have virtual threads anyhow, you just don't formally label them. That's why all the "good async" languages are basically structuring all their "good async" support so that you can write code that looks and behaves... virtually exactly like threads.
The problem with threads isn't the threads... it's the sharing patterns. Lots of nested locks is insane. But that's not the only way to do it. Even in Go it isn't that hard to do things sanely, and things like Erlang and Rust make it even easier.
If you've still only used those languages with "good async support", you really ought to try one of the modern languages that are the real competition. I have not heard very many people become fluent in one of those and then want to go back.
Erlang and Elixir solved this problem. I only write multi-threaded code in very limited cases when it makes sense to split processing out of UI on mobile devices.
Everywhere else I use Elixir, and I write multi-process code and I don't think twice about it.
And I never run into problems.
I'm really feeling like people keep choosing tools that haven't solved the problem, or even tried to, and then thinking that the problem is perennial.
Certainly there's nothing stopping you in particular, (expect maybe the current lack of library support for it, but that could be fixed), not that Erlang is an especially good platform for the task.
So while some people seem to claim Erlang/Elixir may have solved the problem, they shouldn't lose sight of that there are other domains than their own. But still, for a long time there has been support for message-passing systems of different kinds even in languages like C and C++ and its siblings though, just not integrated into the language.
(I love Erlang/LFE personally, but wouldn't write a 3D shooter in it)
> However, these programmers aren’t fleeing concurrency itself - they’re fleeing concurrent access to the same data.
He's not wrong.
Modern real world example: Golang authors designed net library in a such way, that everyone who uses it has to think about concurrent access to shared mutable states. Which is hard and unnecessary. Event loops never had this problem, but for some reason got labeled "non idiomatic" by Golang folks.
So I had to implement event loop myself.
>> In this approach, threads own their data, and communicate with message-passing.
This is the same paradigm as MPI, the message parsing interface. Using it, you also get for free the ability to deploy your "threaded" code in distributed memory architectures. But any person who had just a bit of experience with this standard can tell you how tedious is to develop a parallel code with it. Maybe this is product of the paradigm or just the verbosity of the API (see for example: http://www.mpich.org/static/docs/v3.1/www3/MPI_Alltoallv.htm...).I wish there was some sort of OpenMP or Intel TBB equivalent for MPI to ease out the pain.
My dad said he had to read Hoare when he got his M.S. in the eary '80s, and that half the people who read it didn't understand it, and half the people who understood it ignored it. It's 30 years later and people are still using crappy synchronization primitives.
There is a relatively new actor-like language called paninij[1] which uses the idea of 'capsules'. I have been developing a java annotation based version of it called `@PaniniJ`. Capsule oriented programming enforces modular reasoning, which in turn allows the code to be transformed automatically into multithreaded goodness.
Event-driven with formally designated actors and protocols between actors is pretty old. '90s at least. An old tool, ObjecTime, deferred decisions about which actors went on what threads until the very last thing. Default was that they were round robin, run-to-completion.
It took some measure of work to do so , but people ran ObjecTime code on bare metal.
I am utterly ignorant of what Gecko looks like, but in largerish realtime embedded work, things always seem to end up in more formal design methodologies utilizing transactional models such as message sequences ( frequently expressed in charts. )
Coroutines are great stuff. Being able to yield for an async result and then wake up later, without having to do expensive and buggy lock rendezvous nonsense, is manageable and scalable.
Shared-nothing message passing is an answer, as used in Erlang, but I seem to recall reading that race and deadlock can still occur, just at a higher level.
For example, thread 1 produces a value, and writes it into a member variable of a class. Thread 2 uses that value, but does not synchronize with thread 1 to make sure that it's been produced. But that's fine, because thread 1 finishes before thread 2 uses the value. That is, thread 1 almost always finishes first. But if it doesn't (if it loses the race), then chaos happens - chaos that is very hard to reproduce or debug.
It was scrollable for me using standard keyboard and mouse controls. I did not have to play find the scrollbar. Sounds like a local client issue to me on your end.
Edit: yeah no, still working just fine. Downvoting should not be used when confirm a problem exists or not. What's the deal today Hacker News?
Move your mouse outside of the white blog area and try to scroll. That's the problem being addressed by making the white area take up the full width rather than 70% of it.
Ironically, the fastest model today on typical multi-core silicon looks a lot like old school single-process, single-core event-driven models that you used to see when servers actually had a single core and no threads. One process per physical core, locked to the core, that has complete ownership of its resources. Other processes/cores on the same machine are logically treated little different than if they were on another server. As a bonus, it is very easy to distribute software designed this way.
People used to design software this way back before multithreading took off, and in high-performance computing world they still do because it has higher throughput and better scalability than either lock-based concurrency or lock-free structures by a substantial margin. It has been interesting to see it make a comeback as a model for high concurrency server software, albeit with some distributed systems flavor that was not there in the first go around.