Hacker News new | past | comments | ask | show | jobs | submit login

The "compiler braindamage" is that it is generating instructions to load individual registers 4 bytes at a time where it could instead generate instructions that load eight (properly aligned) bytes at a time into two four-byte registers at once. The bytes in question are some fields in the tcp header and some fields in the pcb (which is like all the state related to the TCP connection).

These instructions here are loading the header into registers:

        ld [%i0+4],%l3     ! load packet tcp header fields
        ld [%i0+8],%l4
        ld [%i0+12],%l2
        ld [%i0+16],%l0
That's the assembler for this:

        u_long seq = ((u_long*)ti)[1];
        u_long ack = ((u_long*)ti)[2];
        u_long flg = ((u_long*)ti)[3];
        u_long sum = ((u_long*)ti)[4];
ld loads 4 bytes at a time into a register but there is a sparc instruction ldd that will load 8 bytes into two registers at a time. If the compiler used ldd these four instructions turn into two. That gets us from 33 to 31. I'm not 100% clear which two fields from the pcb can be loaded simultaneously but there is this line

        ld [%i1+72],%o0                 ! compute header checksum
and then further down

       ld [%i1+68],%o0
which I think are:

       u_long cksum = tp->ph_sum;
and

        if ((flg & FMASK) == tp->pred_flags && seq == tp->rcv_nxt) {
Obviously that line is multiple instructions but what I meant is the part where tp->rcv_nxt is loaded into a register.

So probably tp->ph_sum and tp->rcv_nxt are adjacent in his version of struct tcpcb and he thinks the compiler should do use the same parallel load instruction (loading two registers from 8 bytes at once) for those fields.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: