There seems to be a lot of voodoo beliefs around concurrent programming that lea...

LegionMammal978 · 2024-10-02T16:19:22 1727885962

> However, it's a little baffling to me why it's so hard for devs to grasp how to properly maintain shared state. Instead of just learning the basic rules, it gets couched in "It's just too hard to understand so keep adding things until it works".

Probably because most people aren't aware that there are basic rules to be learned. I'd imagine the typical experience is, you're very familiar with single-threaded code, and now you're trying to let other threads work with your data. You have heard that there are many pitfalls, and that there are special-purpose tools like mutexes to avoid those, but you look at the examples and find them mostly baffling. "Why do they perform these incantations for this data but not that data, or in this place but not that place?" So you come up with some weird mental model and move on with your life, never aware that there are underlying principles for maintaining shared state.

Personally, I didn't understand mutexes very well at all, until I started looking into what the atomic memory orderings from C++ et al. were supposed to mean.

Maxatar · 2024-10-02T16:36:51 1727887011

Not too sure what the basic rules are and I'm not able to find any list of such rules.

For me the biggest challenge when sharing state is that the only benefit I can see for parallelism is performance, so if I'm not gaining performance there is no reason to use parallelism. If I use coarse-grained mutexes then I end up with straight forward to reason about code but I lose the performance benefit and in fact can end up with slower than single threaded code.

If I use very fine grained mutexes then I end up with faster code that has very hard to find bugs that happen on very rare occasion.

And then on top of that even if you do write correct fine grained locking, you can still end up with slow code due to cache behavior such as false sharing and cache coherence.

So ultimately I disagree that writing parallel code is simple unless you're willing to give up performance in which case you may as well just stick to single threaded code or use parallelism among independent data. Writing correct parallel software that shares state and actually delivers substantial performance benefits is incredibly difficult, and I am skeptical that there is a set of simple rules that one can simply read about.

tialaramex · 2024-10-02T16:51:38 1727887898

> Not too sure what the basic rules are and I'm not able to find any list of such rules.

The actual rules are completely terrifying because they involve the physics of microprocessors. If you've watched Grace Hopper's lectures where she gives out physical nanoseconds (pieces of wire that are the same length as the distance light travels in a nanosecond, thus, the maximum possible distance data could travel in that time) you can start to appreciate the problem. It is literally impossible for the intuitive Sequentially Consistent model of how computers work to apply for today's fast yet concurrent processors. Light is too slow.

However generally people mean either Java's memory model or the C++ 11 (and subsequently 14, 17, 20) memory models used in languages such as C++, C and Rust. Those rules are less terrifying but still pretty complicated and the programming language promises to somehow provide an environment where these rules (not the terrifying ones) are all you need to know to write software. So that's nice.

It can be simple to write parallel code for a language designed to make that easy. Yes even if there's shared data. It only started to get trickier if the shared data is modified, so long as it isn't we can make copies of it safely and modern CPUs will do that without actual work by the programmer.

cogman10 · 2024-10-02T17:16:59 1727889419

Are there popular languages that don't have memory models which make reasoning about concurrent models easier?

A language with a notion of threading and shared state is going to have something akin to read/write barriers built into the language memory model to tame the beast.

jcranmer · 2024-10-02T19:46:33 1727898393

I think tialaramex is overselling the complexity of concurrent memory models in practice, at least for end users. In reality, all modern memory models are based on the data-race-free theorem, which states that in the absence of data races--if your program is correctly synchronized--you can't tell that the hardware isn't sequentially consistent (i.e., what you naïvely expected it to do).

Correct synchronization is based on the happens-before relation; a data race is defined as a write and a conflicting read or write such that neither happens-before the other. Within a thread, happens-before is just regular program order. Across a thread, the main happens-before that is relevant is that an release-store on a memory location happens-before an acquire-load on that memory location (this can be generalized to any memory location if they're both sequentially-consistent, but that's usually not necessary).

The real cardinal rule of concurrent programming is to express your semantics in the highest-possible level of what you're trying to do, and find some library that does all the nitty-grityy of the implementation. Can you express it with fork-join parallelism? Cool, use your standard library's implementation of fork-join and just don't care about it otherwise.

xxpor · 2024-10-02T22:48:04 1727909284

tialaramex · 2024-10-02T23:48:30 1727912910

C has the same model as C++ from the same era, so C11 is the C++ 11 model, C23 is C++ 20 and so on.

It's C so you don't get a comprehensive set of bells, whistles and horns like the C++ standard library, but the actual model is the same. At a high level it's all the same as C++ 11, the details are not important to most people.

cogman10 · 2024-10-02T17:03:24 1727888604

> Not too sure what the basic rules are and I'm not able to find any list of such rules.

I'd suggest the book in my original comment, Java concurrency in practice.

> If I use very fine grained mutexes then I end up with faster code that has very hard to find bugs that happen on very rare occasion.

I agree this is a real risk if you are doing fine grained mutexes. But the rules are the same whether or not you want to follow them. If you have shared state (A, B, C) and you want to do a calculation based on the values of (A, B, C) then you need a mutex which locks (A, B, C). Certainly, that become a problem if you have calculations that just require (A, C) and you might want to avoid locking for B. In that case, you need a more complicated mechanism for locking than just simple mutexes which is certainly easy to get wrong. When the (A, B, C) actions happen you have to ensure that the (A, C) actions can't happen at the same time.

This isn't a complicated rule, but it is one that can be hard to follow if you are trying to do super fine grained locking. It's even trickier if you are going to abuse the platform to get correct results.

But fine v coarse isn't the problem I'm referring to when I say people get the simple rules wrong. Rather, than worrying about fine vs coarse grained locking, I very frequently see code where mutexes and concurrency primitives are just peppered everywhere and haphazardly. We might call that super coarse grained.

SJC_Hacker · 2024-10-02T17:49:34 1727891374

> For me the biggest challenge when sharing state is that the only benefit I can see for parallelism is performance, so if I'm not gaining performance there is no reason to use parallelism.

Aside from performance, another very common reason is to not lock the UI from the user. Even in UI-less programs, the ability to abort some operation which is taking too long. Another is averaging out performance of compute tasks, even in the case where it would be faster to handle them sequentially. Without some degree of parallelism these things are not possible.

Consider a web server. Without parallelism every single request is going to completely lock the program until its complete. With parallelism, you can spawn off each request, and handle new ones as they come in. Perceived performance for majority of users in this case is significantly improved even in the case of single processor system - e.g. you have 99 requests which each take a single second, and then one which takes 101 seconds. Total request time is 200 seconds / 100 requests = 2 seconds average per request, but if that 100 second request comes in first, the other 99 are locked for 100 seconds, so average is now > 100 seconds per request ...

Maxatar · 2024-10-02T20:20:59 1727900459

>Aside from performance, another very common reason is to not lock the UI from the user.

This is not a good fit for parallelism, this is pretty much always accomplished using concurrency ie. async/await.

int_19h · 2024-10-02T23:50:43 1727913043

Assuming that the APIs & libraries that you need are async. Which is, unfortunately, not always the case for historical reasons.

SJC_Hacker · 2024-10-09T17:37:49 1728495469

async/await uses threads underneath. And sometimes even that paradigm is not sufficient.

zozbot234 · 2024-10-02T16:57:38 1727888258

> Not too sure what the basic rules are and I'm not able to find any list of such rules.

You may want to consider https://marabos.nl/atomics/ for an approachable overview that's still quite rigorous.

tombert · 2024-10-02T17:16:59 1727889419

+1 for the Java Concurrency in Practice book. It's the book I recommend to nearly everyone who wants to get into concurrent programming. Goetz makes it a lot more approachable than most other books.

hinkley · 2024-10-02T17:24:37 1727889877

Goetz has come a long way. I knew one of the people who contributed to that book and he was a little frustrated about having to explain things to him he felt he shouldn’t have had to. The implication was he’d already had this conversation with some of the other contributors.

Sometimes though, the newbie is going to write the clearest documentation.

hinkley · 2024-10-02T17:22:57 1727889777

I loved concurrent code when I was starting out. I’d taken a pretty good distributed computing class which started the ball rolling. They just fit into how my brain worked very well.

Then I had to explain my code to other devs, either before or after they broke it, and over and over I got the message that I was being too clever. I’ve been writing Grug-brained concurrent code for so long I’m not sure I can still do the fancy shit anymore, but I’m okay with that. In fact I know I implemented multiple reader single writer at least a few times and that came back to me during this thread but I still can’t remember how I implemented it.

tombert · 2024-10-02T18:49:27 1727894967

That's something I'm afraid of for my latest project. I did some concurrent stuff that wasn't 100% clear would actually work, and I had to write a PlusCal spec to exhaustively prove to myself that what I was doing is actually OK.

It works pretty well, and I'm getting decent speeds, but I'm really scared someone is going to come and "fix" all my code by doing it the "normal" way, and thus slow everything down. I've been trying to comment the hell out of everything, and I've shared the PlusCal spec, but no one else on my team knows PlusCal and I feel like most engineers don't actually read comments, so I think it's an inevitability that my baby is killed.

f1shy · 2024-10-02T16:22:48 1727886168

Maybe because I had a complete semester of multiprogramming in the uni, I see almost trivial to work in such environments, and cannot comprehend why is so much mystic and voodo. Actually is pretty simple.

tombert · 2024-10-02T18:54:13 1727895253

I feel like it's not terribly hard to write something that more or less works using mutexes and the like, but I find it exceedingly hard to debug. You're at the mercy of timing and the scheduler, meaning that often just throwing a breakpoint and stepping through isn't as easy as it would be with a sequential program.

I feel like with a queue or messaging abstraction, it can be easier to debug. Generally your actual work is being done on a single thread, meaning that traditional debugging tools work fine, and as I've said in sibling comments, I also just think it's easier to reason about what's going on.