I'm taking specifically about the workloads I was interested in at the time where cache optimised means it took pains take advantage of larger L1s and took pains to get L1 hits. But this was a general problem too and noted at the time by many people.
AMDs smaller L1 was as definite negative at the time. This was back when hyperthreading could be a net negative because of the reduced L1 cache per thread so we would turn that off to.
