When I was in college an 6800 derived embedded systems prototype board had 'really fast' in-package memory used for such a setup.
In the context of something vaguely like an SoC where you can make "0 page" memory registers fast with a small bank of high speed SRAM it can make sense, particularly for decoupling manufacturing defects or silicon production processes.
Of course for modern, potentially out of order and speculative branch predicting, pipelined instruction systems this is a horrid idea.
Well, actually, once you buy into everything you need for O-O-O execution with synchronous exceptions, a lot of stuff that seems difficult at first glance becomes cheap because you can build on the existing O-O-O infrastructure. Anytime you can belly-flop onto the reorder buffer scoreboard the hard stuff just falls out.
In the context of something vaguely like an SoC where you can make "0 page" memory registers fast with a small bank of high speed SRAM it can make sense, particularly for decoupling manufacturing defects or silicon production processes.
Of course for modern, potentially out of order and speculative branch predicting, pipelined instruction systems this is a horrid idea.