If you want to add 2 32-bit integers, on 6502 you'll need something like the following, assuming this is a 32-bit integer you're actively working with and are probably about to use again fairly soon:
That's for a total of 38 cycles. So on the computer I started programming on, you could do ~52,000 32-bit adds per second.
By comparison, for a modern Pentium, according to Intel's docs, a 32-bit add (again, on data you're using) takes 1 cycle, end to end.
ADD ESI,EDX
So on the laptop that's in front of me, which is a crap one, you could do 2,530,000,000 32-bit adds per second. A 48,000-fold performance increase. Maybe 96,000 times, if you have no dependency chain (ADD throughput is 2 per cycle).
This ignores the fact my modern computer has 2 cores.
And that's loading/storing to/from the zero page (the first 256 bytes of memory). Loading/storing from higher addresses requires 4 cycles.
But, "ADD ESI,EDX" is adding two registers isn't it? So I think you need to include the loading/storing of those registers back to memory for a more fair comparison.
I haven't touched 6502 assembly in over 20 years. Brings back memories. :-)
If you're adding constants, you might as well load each byte of the result directly, when you need it. (I can't tell where the LSB is coming from in this code - perhaps it isn't a constant? - this example doesn't resemble any code I've ever had to write.)
Perhaps the code is intended to be modified at runtime, but then you'd then still want one of the operands loaded from memory, I think (otherwise why not just precalculate the results?), and I've generally found the (fairly substantial) fixed expense not to be worth it anyway.
Anyway, overall I think you're being a bit unfair to the x86 with this comparison.
By comparison, for a modern Pentium, according to Intel's docs, a 32-bit add (again, on data you're using) takes 1 cycle, end to end.
So on the laptop that's in front of me, which is a crap one, you could do 2,530,000,000 32-bit adds per second. A 48,000-fold performance increase. Maybe 96,000 times, if you have no dependency chain (ADD throughput is 2 per cycle).This ignores the fact my modern computer has 2 cores.