I think AMD has always had the talent, but made a bad bet with bulldozer's architecture of two cores sharing a front-end and FPU[1]. I also think desktop software (specifically games) moved faster than some anticipated when it came to taking advantage of multiple cores, causing them to be seem resource starved.
[1] At least that's how I think they work. One decode and FPU per pair of integer cores, right?
Bulldozer's failure was really one of market positioning. It would have been much better received if the CMT modules had been presented as a single, dual-threaded core, positioned against an Intel HTed core, rather than as two separate cores.
While I generally agree that CMT threads are not a real "core" (and has never been presented as such by any other company that has explored CMT), the real problem with Bulldozer was a frequency-optimized, deeply pipelined design. AMD literally repeated the exact mistake that Intel did with Netburst. It would never have been good no matter how you positioned it.
[1] At least that's how I think they work. One decode and FPU per pair of integer cores, right?