When your loop body uses complex library APIs over complex data it's still hard to be confident in C++ that everything's threadsafe and you're avoiding data races.
Maybe it's not so hard if you're in a domain like HPC where the libraries you use are designed specifically to be used with data parallelism. But when you're pulling together code from different sources that may or may not have been used in an aggressively parallel application before...
I think it's less about libraries and more about the general approach to programming.
In the HPC world, software is usually doing one thing at a time. Most of the time it's either single-threaded, or there are multiple threads doing the same thing for independent chunks of data. There may be shared immutable data and private mutable data but very little shared mutable data. You avoid situations where the behavior of a thread depends on what the other threads are doing. Ideally, there is a single critical section doing simple things in a single place, which should make thread-safety immediately obvious.
You try to avoid being clever. You avoid complex control flows. You avoid the weird middle ground where things are not obviously thread-safe and not obviously unsafe. If you are unsure about an external library, you spend more time familiarizing yourself with it or you only use it in single-threaded contexts. Or you throw it away and reinvent the wheel.
If the APIs that you're interacting with are side-effect free then it's easy. If they are full of side effects, then they aren't written with multithreading in mind and you wouldn't be able to even compile it in Rust. C++ just takes off the training wheels.
It's a bit more complicated than that, because code can be thread-safe but not side-effect-free, but basically you're just restating what I said. C++ makes it hard to be sure code is really safe to use across threads, which means in practice developers should be more reluctant to do so.
Maybe it's not so hard if you're in a domain like HPC where the libraries you use are designed specifically to be used with data parallelism. But when you're pulling together code from different sources that may or may not have been used in an aggressively parallel application before...