is there an example of machine code that doesn't make use of a linear contiguous call stack?
what would the alternative be? compute the size of all the stack frames a-priori in the compiler and then spray them all over main memory and then maintain a linear contiguous list of addresses? doesn't the linear contiguous nature of function call stacks in machine code preserve locality in order to make more efficient use of caches? or would the caches have to become smarter in order to know to preserve "nearby" stack frames when possible?
also, why not just make the addresses wider and put the pid in the high bits? they're already doing this masking stuff for the security descriptors, why not just throw the pid in there as well and be done with it?
> is there an example of machine code that doesn't make use of a linear contiguous call stack?
Early CPUs didn’t have support for a stack, and some early languages such as COBOL and Fortran didn’t need one. They didn’t allow recursive function calls, so return addresses could be stored at fixed addresses, and a return could either be an indirect jump reading from that address or a direct jump whose target address got modified when writing to that fixed address (see https://people.cs.clemson.edu/~mark/subroutines.html for the history of subroutine calls)
If you put the pid in the high bits of every pointer you run into problems when you want to support fork(). The child gets a new pid but keeps all the same pointers as its parent.
And that only protects you against one process trying to read/write another process's memory. We already have good support for this type of protection on current hardware/OSes. What is desired is something that prevents a process from reading/writing its own memory, through an out of bounds pointer. This would prevent buffer overflows, stash smashing, etc. and be a good defense against RCE.
Non-contiguous stacks are typically just linked-lists of call frames, or groups or call frames. One of the examples that used to be regularly trotted out comp.lang.c 15+ years ago was an IBM mainframe environments, where stacks were linked lists but still managed by hardware.
Off the top of my head, current examples of non-contiguous stacks on more familiar system with supporting code compiled directly to machine code (i.e. not language interpreters and VMs, where this is actually very common), include
1) gccgo, which compiles Go code to use split stacks.
2) gcc with -fsplit-stacks, which permits compiling C (and C++?) code which uses the same stack discipline as gccgo.
3) OCaml's new concurrency work utilizes non-contiguous stacks. I'm not familiar with OCaml toolchains, but notably they do generate DWARF and related tables so both debuggers and the garbage collector can properly walk logical stacks. See https://arxiv.org/abs/2104.00250
4) There's a stack-smashing and ROP prevention technique call shadow stacks, where a single logical call stack is implemented using two contiguous stacks, one primarily for return addresses (or related metadata) and the other primarily for objects. See https://en.wikipedia.org/wiki/Shadow_stack Both GCC and clang support software-implemented shadow stack disciplines (SafeStacks), and some newer Intel and ARM ISA extensions implement hardware-enforced shadow stacks.
The linked article doesn't mention call stacks explicitly, but describes the R1000 arch was object+offset addressed in HW. So unless they restricted the call stack to fit into one object and use only offsets, then yes, they must have chained objects together for the stack.
When you have a page-based memory model, you've created the importance of address locality. If you have object-based memory model, and the working set is of objects, not pages, then address locality between objects doesn't matter.
Of course, page-based based memory models are by FAR the most common in practice.
(Note: pages ARE objects, but the objects are significant to the VM system and not to your program. So strictly, page-based models are a corner case of object-based models, where the objects are obscure.)
would be interesting to see how the actual call stack is implemented. they must either have a fixed width object as you mention or some kind of linear chaining like you're describing.
memory and disk are unified into one address space, code is represented by this "diana" structure which can be compressed text, text, ast or machine code. would be curious how procedures are represented in machine code.
When you reverse engineer "normal" code, you know the CPU can handle integers of X, 2X, 4X bit widths, you know it wants things aligned this way or that way etc.
The R1000 is bit addressed, and integers can be any width from 1 to 64 bits, structures can be "variant" (See: Ada) so reverse engineering the storage layout is ... not an accomplished feat yet.
Since the primary task for the R1000 was Ada program development, and since there were a standardized semantic representation of Ada programs ("DIANA"), one of the fundamental hardware/microcode data-types is "DianaTree", which is, I think a pretty generic CS tree of some kind.
These "DianaTree" types seem to also have been used for other stuff, like directories in the object store etc.
Fun fact: for many languages a linear memory block doesn't work - because execution context lifetimes aren't strictly FIFO. For example continuations are a language feature that causes this (unless they're restricted eg to single-shot).
are there alternatives to linearly growing call stacks?