(where allocate_frame allocates enough space for storing link to previous stack frame, all args and all locals; then, it copies all arguments, return address and stack pointer, switches stacks to the new stacks and returns from the call)
You can modify the return to call "deallocate_frame" (which would switch stacks back before returning), or you can just make sure the return address on the new stack points to "deallocate_frame_and_return", and use a regular return in place.
[Actually, come to think of it, it's an excellent idea for a stack protection system that can be turned off for speed - the call to "allocate_frame" could be configured at runtime to patch its caller to just do "sub esp, LOCAL_SPACE_NEEDED" for maximum speed. Or it could just check stack overflow; or it could allocate frames on heap for better memory use; and it could mprotect/virtualprotect the edges of this frame for protection]
That's another option for doing it. However, writing in CPS implies that your calls are all tail calls, and they all take a continuation closure as an argument.
I know what CPS is. But Dan states that "CPS is more efficient than growable stacks", and I can't see how CPS plays a nontrivial part in the stack size optimization.
If you're allocating the stack frame on the heap, you can just replace the prologue (e.g. x86 intel syntax, cdecl calling convention)
with (where allocate_frame allocates enough space for storing link to previous stack frame, all args and all locals; then, it copies all arguments, return address and stack pointer, switches stacks to the new stacks and returns from the call)You can modify the return to call "deallocate_frame" (which would switch stacks back before returning), or you can just make sure the return address on the new stack points to "deallocate_frame_and_return", and use a regular return in place.
[Actually, come to think of it, it's an excellent idea for a stack protection system that can be turned off for speed - the call to "allocate_frame" could be configured at runtime to patch its caller to just do "sub esp, LOCAL_SPACE_NEEDED" for maximum speed. Or it could just check stack overflow; or it could allocate frames on heap for better memory use; and it could mprotect/virtualprotect the edges of this frame for protection]