My argument isn't that modern software would run faster on such a CPU; my argument is instead that:
1. the most trivially-true version of the argument: software written for that CPU would run faster on a modern "remastering" of that CPU, than it would run (directly, via a lot of microcode-level emulation; or indirectly, via actual emulation) on a modern CPU. (Yes, some software that's still binary-forward-compatible with modern CPUs—only using generic ISA ops—would be faster on the modern CPU. But I'm talking about the worst, most persnickety edge-case uses of the ISA. The kinds of "requires a whole different model of the world to have the right side-effects" ops that make IBM write emulators for their previous mainframe architectures, rather than just shimming those ops into their new POWER ISAs and doing load-time static recompilation to the new ISA.)
2. smaller transistor size would mean less total power draw per cycle—i.e. it's a rather dim bulb—which means you could overclock the heck out of that CPU.
3. As long as you don't also make the die-size any smaller (but rather just lay out your small transistors with super-long trace-paths between them), then you're not decreasing the thermal surface-area of the die in the process, so you can then attach a modern cooling setup to it to clock it even higher.
4. Or, if you like, you can shrink the die-size and produce a compact 10nm 8088, at which point it'd probably be, say... 1 sq. mm? Smaller than a Cortex-M0+, for sure. That's the point when things are small enough that you can start to do wacky things like covering the entire (uncapped) die surface in a focused laser beam, to cool it by entangling the coherent "negative temperature" photons with the traces' positive-temperature baryons, as an indiscriminate version of an atomic-force microscope's method of ion capture.
But what would you do with that speed? Let’s say you run that 8088 at 10GHz (about a factor 1,000 faster than a fast 8088).
What useful algorithm needs that speed but not more than 1MB memory (that CPU could read and write its entire address space a thousand times a second)?
"Maybe thousands of computer architects are incompetent and don't realize there's no benefit from anything they've done in the past 40 years".
Does that sound likely?
Disclaimer: I'm one of those architects.
Edit: I tried to "strel-man" your argument, but I couldn't see how.