To clarify that the CPython devs aren't being arbitrary here: There have been attempts at per-object or other fine-grained locking, and they appear to be less performant than a GIL, particularly for the single-threaded case.
Single-threaded performance is a major issue as that's most Python code.
Yes. I expect generic fine-grained locking, especially per-object leaks, to be less performant for multi-threaded code too, as locks aren't cheap, and even with the GIL, lock overhead could still be worse than a good scheduler.
Any solution which wants to consider per-object locking has to consider removing refcounting, or locking the refcount bits separately, as locking/unlocking objects to twiddle their refcounts is going to be ridiculously expensive.
Ultimately, the Python ownership and object model is not condusive to proper threading, as most objects are global state and can be mutated by any thread.
Single-threaded performance is a major issue as that's most Python code.