In terms of instructions per clock and I/O bandwidth it compares more favorably to 16 bit architectures than 32 bit ones even though the Cortex-M family is nominally 32 bits.
RP2040 is designed to provide full bandwidth to both cores at once without bandwidth contention. Combined with the PIO's you can do some really impressive bitbanging.
Plus some of the early 8 bit machines drove a display with minimal extra hardware - eg the ZX80 / ZX81 although it was very very slow as a result!