I’ve literally spent 100s of hours with it. I’m mystified why so many people use the “you’re holding it wrong” explanation when somebody points out real limitations.
You might consider that other people have also spent hundreds of hours with it, and have seen it correctly solve tasks that cannot be explained by regurgitating something from the training set.
I'm not saying that your observations aren't correct, but this is not a binary. It is entirely possible that the tasks you observe the models on are exactly the kind where they tend to regurgitate. But that doesn't mean that it is all they can do.
Ultimately, the question is whether there is a "there" there at all. Even if 9 times out of 10, the model regurgitates, but that one other time it can actually reason, that means that it is capable of reasoning in principle.
When we've spent time with it and gotten novel code, then if you claim that doesn't happen, it is natural to say "you're holding it wrong". If you're just arguing it doesn't happen often enough to be useful to you, that likely depends on your expectations and how complex tasks you need it to carry out to be useful.
In many ways, Claude feels like a miracle to me. I no longer have to stress over semantics or searching for patterns I can recognize and work with, but I’ve never actually coded them myself in that language. Now, I don’t have to waste energy looking up things that I find boring