The "compiler braindamage" is that it is generating instructions to load individual registers 4 bytes at a time where it could instead generate instructions that load eight (properly aligned) bytes at a time into two four-byte registers at once. The bytes in question are some fields in the tcp header and some fields in the pcb (which is like all the state related to the TCP connection).
These instructions here are loading the header into registers:
ld loads 4 bytes at a time into a register but there is a sparc instruction ldd that will load 8 bytes into two registers at a time. If the compiler used ldd these four instructions turn into two. That gets us from 33 to 31. I'm not 100% clear which two fields from the pcb can be loaded simultaneously but there is this line
Obviously that line is multiple instructions but what I meant is the part where tp->rcv_nxt is loaded into a register.
So probably tp->ph_sum and tp->rcv_nxt are adjacent in his version of struct tcpcb and he thinks the compiler should do use the same parallel load instruction (loading two registers from 8 bytes at once) for those fields.
These instructions here are loading the header into registers:
That's the assembler for this: ld loads 4 bytes at a time into a register but there is a sparc instruction ldd that will load 8 bytes into two registers at a time. If the compiler used ldd these four instructions turn into two. That gets us from 33 to 31. I'm not 100% clear which two fields from the pcb can be loaded simultaneously but there is this line and then further down which I think are: and Obviously that line is multiple instructions but what I meant is the part where tp->rcv_nxt is loaded into a register.So probably tp->ph_sum and tp->rcv_nxt are adjacent in his version of struct tcpcb and he thinks the compiler should do use the same parallel load instruction (loading two registers from 8 bytes at once) for those fields.