block
of
x86
code
je L43
more
x86
code
jmp L57
L33:
sparc
code
never
mind
L43:
x86
code
L37:
sparc
never
mind
L57:
x86
code
again
That kind of thing: blocks of code mashed together, but everything is correctly generated to jumps to its own kind.
In regular compiler-generated machine code, we already have "foreign" blocks of stuff in the middle of the instructions, such as string (and other) literals, and computed branch tables. The generated code doesn't accidentally jump into these things. This is kind of the same: all the other architecture stuff is just a literal (that is not referenced). From the x86 POV, the stuff at L33 and L37 above is just data.
No, that's not the way I remember they did it. I think you'd have a run of instructions for one architecture, and the only thing that interrupted a run of them would be a branch or a jump. So for example, if you had something like:
for (i=0; i<10; i++) {
// body of loop
}
Then you'd have all of the x86 instructions for the loop in contiguous bytes, after that block you'd have the MIPS instructions, then the SPARC instructions and so on.
I could be wrong about this, which is why I am now really adamant that I find the article I remember, so I can figure out what they really did. But I think there are three levels of granularity they could have done:
1. They could have had what was essentially a monolithic code block for each architecture, concatenated those blocks and had the little bit of magic at the beginning of the executable figure out which block to dispatch to, then execution would stay in that block. I don't think they did this.
2. At the other extreme, they could interleave individual instructions, like you just described. I don't think they did this either.
3. They could have runs of architecture specific instructions corresponding to basic blocks [1], which is what I tried to describe above. I think that's what they did, but I could easily be misremembering.