I'd argue the real problem here is that too many people are taught to rely on shared mutable state in concurrency. Mutable state and concurrency mix very easily as long as you stick with something like message passing for your cross-thread communication. Erlang stands out as a stellar example of how successful that approach can be, but it can just as easily be done in C. Event-driven and client/server programming, as close cousins of the Erlang model, are concurrent even when they aren't parallel, and have given us a couple decades' worth of proof that good programmers don't really have a problem with it, and even mediocre programmers can manage it. The point where everything really gets hairy when the word 'lock' gets introduced into the mix.
If we take, for the sake of argument, that the primary motivation for writing parallel code is some version of scalability or performance, then I don't think we should be so hasty to jettison mutability. After all, concurrency is only a means to an end. The real goal is performance, and it's generally much easier, if not more possible, to write performant code if you allow yourself some mutable state, however judiciously it is used. The hash table example from my earlier post is an elephant in the room here: Hash tables can give near constant-time performance. The closest anyone has ever gotten with a purely functional equivalent is O(log n) plus a healthy dose of cache-unfriendliness to boot. If that must be the cost of pervasive parallelism, perhaps we should be reconsidering how pervasive parallelism really needs to be.
The problem with message-passing approaches is that you have to keep multiple copies of your data structures in sync and this can be just as tricky as handling mutable data. The kind of functional data structures you find in languages like Clojure or Haskell can simplify these issues a lot
If we take, for the sake of argument, that the primary motivation for writing parallel code is some version of scalability or performance, then I don't think we should be so hasty to jettison mutability. After all, concurrency is only a means to an end. The real goal is performance, and it's generally much easier, if not more possible, to write performant code if you allow yourself some mutable state, however judiciously it is used. The hash table example from my earlier post is an elephant in the room here: Hash tables can give near constant-time performance. The closest anyone has ever gotten with a purely functional equivalent is O(log n) plus a healthy dose of cache-unfriendliness to boot. If that must be the cost of pervasive parallelism, perhaps we should be reconsidering how pervasive parallelism really needs to be.