It's amazed me that the first 50 years of computers were "this is how to structure memory for operating systems" and the next 30 have been "hackers take advantage of all of that so we need to do a bunch of convoluted stuff just to try to thwart them." Kind of unfortunate that so much energy has to be spent on this work, but I appreciate that it is.
Harvard architecture machines (which are not uncommon in microcontrollers)
Segmented memory, or any non-flat memory
Addressable memory with non-uniform access time (cache doesn't count because cache lines can't be addressed directly)
Address spaces not a factor of 2. Variable byte sizes ("byte" did not mean "8 bits" until the 360, and even in those days, just in the IBM world)
Word length larger than address length.
Some hardware-tagged architectures.
Machines with hardware-supported transporting GCs.
Different regions of memory that are architecturally distinct (shared memory with machines of different architectures, which these days can mean GPUs).
One alternative CPU model were the various Lisp machines.
A really amazing example, though, was Intel's first 32 bit CPU, the iAPX 432 [1] -- it was actually object oriented. And it supported garbage collection (like Lisp machines).
It was kind of beautiful from one point of view, but it was absurdly impractical and complicated and slow. It was an extreme example of CISC, and the (simple and fast) RISC revolution killed off such things. Well, the iAPX killed itself, but...
Ok, but they're not friendly to anything but assembler.
Well, since they typically have memory that cycles in the ballpark of instruction cycle times (unlike desktop and server CPUs where there's two orders of magnitude difference), that's friendly to Forth I suppose. But that is a small minority of usage even in those environments. It's more like Forth is friendly to slow architectures. :)
I think most of these solutions probably predate 8051 and ST7 cores(which both have stack pointers) where I ran into them but reduced memory models are still pretty useful for them, due to the overhead vs RAM usually fitted. It's too late to edit my comment. I'll mostly discuss the fallout from static stack variable allocation.
From https://www.st.com/resource/en/user_manual/um0015-st7-8bit-m... (8.3.5 Limitations put on the full implementation of C language)(I think this is talking about Hicross C, but COSMIC and Raisonance work the same depending on stack memory model): The ST7 family provides a limited RAM size, of which the stack takes only a part, that can be as small as 64 bytes. This does not allow the use of the stack for parameter passing. Thus, the implementation of C for the ST7 uses registers and a few memory locations to pass the parameters, and allocates local variables to RAM just like global variables. This works the same way as in a typical implementation, but with the following restrictions...
You can still get an evaluation copy of COSMIC C and try this out. Here I made a function call itself void port_init(void){port_init();}. Note that this error comes from the linker, clnk, not the compiler, because the linker is responsible for stack allocation globally, as described above):
#error clnk vumeter.lkf:1 function _port_init is recursive.
There's a similar error if you call a function from anything called from main() and also from any interrupt entry point. This is because the memory model isn't re-entrant, so calling the same function from >1 path can cause them to overlap, corrupting their staticly allocated variables.