And in CPython it's less that the GIL is the flaw, but more the refcounted GC. Back in '96 there was a patch to remove the GIL, run of the mill single threaded python code ran 2-6x slower (largely dependent on threading implementation used) due to all the locking overhead around the refcount updates happening all the time. When you have a language that is perceived as slow already, making the vast majority of typical scripts of the time that much slower to allow MT python to be faster was going to be a very difficult thing to sell.
I don't necessarily agree that it's a flaw, but that's another discussion entirely.