If you're moving memory around in bank 0 (or have memory mapping), you can use the direct page register to read/write anywhere in bank 0 and the stack to read/write anywhere in bank 0.
16-bit LDA dp, PHA is 4 + 4 = 8 cycles or 4 cycles per byte. Best case would be if you know it's constant data before hand, eg, LDA #0, PHA, PHA .... 2 cycles per byte!
For general purpose copying MVP and MVN are easier and have better code density.
If you're moving memory around in bank 0 (or have memory mapping), you can use the direct page register to read/write anywhere in bank 0 and the stack to read/write anywhere in bank 0.
16-bit LDA dp, PHA is 4 + 4 = 8 cycles or 4 cycles per byte. Best case would be if you know it's constant data before hand, eg, LDA #0, PHA, PHA .... 2 cycles per byte!
For general purpose copying MVP and MVN are easier and have better code density.