While I haven no personal experience writing aarch64 assembler code my experienc...

While I haven no personal experience writing aarch64 assembler code my experience with ARM v6m an v7m makes me doubt your implied insult that ARM just failed/didn't give a fuck about their instruction set. Thumb 1 and 2 are well designed instruction sets optimized for a certain kind of uarch. Almost all quirks exposed to the low level programmer are there for good reasons and while some of the constraints are a pose a challenge for compiler writers they are not beyond the capabilities of GCC or LLVM. There are several possible reasons for ARM to return to a fixed length 32 bit encoding e.g. to allow very wide OoO designs like Apple's Firestorm cores or because the gain is smaller for 64 bit code with larger constants and better served by PC relative constant pools. And while the quirky LDMIA function prologue is very flexible, appeals to me as assembler programmer and saves code space having a single instruction potentially modify most integer registers as well as change the program counter and the active instruction set is hard to implement well while easier to implement register pair load/store instructions are enough for most common instruction sequences. The tradeoff was different for in-order ARM2/3 CPUs with single ported memory and a tiny unified cache (if that).