Also, wouldn't little-endianness generally benefit 16 bit arithmetic operations performed on 8 bit ALUs? Because (correct me if I'm wrong) a carry flag always carries over from LSB to MSB, processing the LSB before the MSB allows you to implement the carry mechanism more efficiently.
Yes, though few 8-bit CPUs supported 16-bit arithmetic. Most required two 8-bit operations with carry instead, an din that case there's no difference between little and big endianness - in both cases the LSB has to be calculated first.
The big endian 6809 supports 16-bit arithmetic, and is indeed punished for it: 16-bit ADD and SUB costs an extra cycle compared to a little endian design.