If you want performance, it's kinda funny to use pure Python and then mourn the fact that you can't run, on 8 cores, Python code that's 30-50x slower than C to begin with.
I really dislike this attitude. If Python is much easier, more pleasant and safer to write than C, then being able to get a ~8x (or even 4x) speedup by running on an 8 core machine is a perfectly reasonable thing to want to do.
Well, let's say you can't use numpy or similar - "C written by someone else" - and further, let's imagine a world where CPython has no GIL. Then there are 2 options:
1. You rewrite things in single-threaded C, getting a speed-up of 30-50x.
2. You rewrite things in multi-threaded Python, getting a speed-up of say 4-8x.
I argue that 1 is often actually safer than 2, because parallelism-related bugs are harder to hunt down than C's memory-overwriting bugs (this can be mitigated with good APIs and a good race detector working against these APIs - as in Cilk or https://github.com/yosefk/checkedthreads - but it still requires good testing coverage).
Now if 2 were more pleasant and safe than 1, I might agree with you even though 1 gives the better speed-up. But I believe 1 is faster and safer.
-Python has a lot of very good and relatively simple APIs that are easy to get started with.
-Python allows you to write with little concern for memory management
-Python allows for (nearly) type-free interaction.
-C allows direct memory management.
-C allows for strong typed variables for reflection-less execution (though 'auto' was just added to gcc!)
-C allows you to interact with kernel-level APIs almost directly.
I don't see how those same things couldn't in theory be achievable in a newer spinoff of C/C++, with twists on the rules of the environment that you yourself are comfortable writing in. Realistically there's no reason to neglect what either offers -- we should strive to make them achievable in the most comforting and adaptable grammar.
Maybe one that allows you to set your own rules when you need them, to enforce them, and to let you enforce configure those safeties in modules or components that need them, at compile-time!
It really depends on the problem though. There are plenty of problems which are reasonably easy to get a parallelism speedup (even if not close to the ideal speedup) by only rewriting or restructuring small parts of the program, but where rewriting in C would be a hell of a lot of complex work.
A lot of people write terrible C code with a plethora of hard to find memory bugs. A lot of other people don't even know C at all. The GIL means that people can't even try[1] to get some speedup from Python without having to reach for C.
[1] I know, I know, Python lets you use multiple processes for parallelism. I also dislike the attitude that this is a an adequate enough solution that thread-based parallelism sn't needed, but that's a story for another day.
Quite a few python programmers I know are physicists or other non-professional programmers who really don't know how to write C, use gdb, debug linking problems, etc. For them I think staying in pure python is really a large plus, even if it means reading a bit about locks and threads.
I really dislike this attitude. If Python is much easier, more pleasant and safer to write than C, then being able to get a ~8x (or even 4x) speedup by running on an 8 core machine is a perfectly reasonable thing to want to do.