Back in the ARM2 days (I never coded for ARM1 - they made very few of them) you knew exactly how long each instruction would take to execute (1 cycle for arithmetic, 4 cycles for load/save single value and 4 cycles + 1 for each location for load/save multiple if I remember correctly).
You could use the barrel shifter to shift/rotate arguments as part of those 1 cycle instructions, using up a whole zero extra cycles!
Nowadays with superscalar ARMs with branch prediction and onboard caches things aren't nearly so straight forward.
There is a nice writeup about what it could do for you here:
https://www.davespace.co.uk/arm/introduction-to-arm/barrel-s...
Back in the ARM2 days (I never coded for ARM1 - they made very few of them) you knew exactly how long each instruction would take to execute (1 cycle for arithmetic, 4 cycles for load/save single value and 4 cycles + 1 for each location for load/save multiple if I remember correctly).
You could use the barrel shifter to shift/rotate arguments as part of those 1 cycle instructions, using up a whole zero extra cycles!
Nowadays with superscalar ARMs with branch prediction and onboard caches things aren't nearly so straight forward.