I recently did a similar thing [1]. The major difference I did was to store the registers on the stack instead of a separate structure. The stack is the thread structure.
I also have a version for x86-32 bit and x86-64 bit (and they work under Linux and Mac OS-X). The assembly code isn't much, but it took a while to work out just what was needed and no more.
I also have a version for x86-32 bit and x86-64 bit (and they work under Linux and Mac OS-X). The assembly code isn't much, but it took a while to work out just what was needed and no more.
[1] http://boston.conman.org/2017/02/27.1