I did something similar with NES; built an emulator, and come back to it every couple of years. It's had 2 CPU cores (2nd added recognition of idle loops, and runs instructions in chunks instead of 1 at a time), 3 PPU implementations (each iteration emulated at a higher level, adding caching and hardware compositing), and 2 implementations of the ROM subsystem (adding caching and multiple memory mappers the second time around).
There are similar testing ROMs for NES emulators, too. My emulator tends to pass the CPU tests and fail some of the PPU ones; guess I need to write a 4th implementation of that chip!
There are similar testing ROMs for NES emulators, too. My emulator tends to pass the CPU tests and fail some of the PPU ones; guess I need to write a 4th implementation of that chip!