x86 makes decoders way more complex than fixed-width instruction set. In turn this leads to hacks like hyperthreading to allow to make decoders to run in parallel and utilize the whole core. Notice how it is absent from M1 since Apple was able to throw more decoders to single instruction flow and make single-threaded performance great.
Hyperthreading isn't just about decoders, it's about occupying all the units as much as possible. Some Arm (RIP ThunderX2), POWER and SPARC server chips have 4-wide and even 8-wide SMT.
Multiprocess python code massively benefits from hyperthreading and it's not bottlenecked by instruction decoding at all. It's bottlenecked by chasing pointers.