Hacker News new | past | comments | ask | show | jobs | submit login

If you want to add 2 32-bit integers, on 6502 you'll need something like the following, assuming this is a 32-bit integer you're actively working with and are probably about to use again fairly soon:

    CLC                  ; 2
    LDA&70 ADC&74 STA&70 ; 3 3 3 = 9
    LDA&71 ADC&75 STA&71 ; 3 3 3 = 9
    LDA&72 ADC&76 STA&72 ; 3 3 3 = 9
    LDA&73 ADC&77 STA&73 ; 3 3 3 = 9
That's for a total of 38 cycles. So on the computer I started programming on, you could do ~52,000 32-bit adds per second.

By comparison, for a modern Pentium, according to Intel's docs, a 32-bit add (again, on data you're using) takes 1 cycle, end to end.

    ADD ESI,EDX
So on the laptop that's in front of me, which is a crap one, you could do 2,530,000,000 32-bit adds per second. A 48,000-fold performance increase. Maybe 96,000 times, if you have no dependency chain (ADD throughput is 2 per cycle).

This ignores the fact my modern computer has 2 cores.




And that's loading/storing to/from the zero page (the first 256 bytes of memory). Loading/storing from higher addresses requires 4 cycles.

But, "ADD ESI,EDX" is adding two registers isn't it? So I think you need to include the loading/storing of those registers back to memory for a more fair comparison.

I haven't touched 6502 assembly in over 20 years. Brings back memories. :-)


This is working data, so you'd keep it in a register if possible. Sadly that just happens not to be possible on the 6502 :)


Two bytes can be kept in the X and Y registers. Immediate load and add instructions only use two cycles.

    CLC                    ;         2
            ADC #b1 STA&70 ; 0 2 3 = 5
    LDA #a2 ADC #b2 TAY    ; 2 2 2 = 6
    LDA #a3 ADC #b3 TAX    ; 2 2 2 = 6
    LDA #a4 ADC #b4        ; 2 2   = 4
                           ; total  23 cycles


If you're adding constants, you might as well load each byte of the result directly, when you need it. (I can't tell where the LSB is coming from in this code - perhaps it isn't a constant? - this example doesn't resemble any code I've ever had to write.)

Perhaps the code is intended to be modified at runtime, but then you'd then still want one of the operands loaded from memory, I think (otherwise why not just precalculate the results?), and I've generally found the (fairly substantial) fixed expense not to be worth it anyway.

Anyway, overall I think you're being a bit unfair to the x86 with this comparison.


Yes, the comparison is relative, it may as well be 10000x better (in orders of magnitude)

You're even playing nice against the 6502, you're using a simple add, now compare with SIMD instructions




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: