ChatGPT (3.5) seems to do some rudimentary backtracking when told it's wrong enough times. However, it does seem to do very poorly in the logic department. LLMs can't seem to pick out nuance and separate similar ideas that are technically/logically different.
They're good at putting things together commonly found together but not so good at separating concepts back out into more detailed sub pieces.
I've tested GPT-4 on this and it can be induced to give up on certain lines of argument after recognising they aren't leading anywhere and to try something else. But it would require thousands (I'm really under exaggerating here) of restarts to get through even fairly simple problems that professional mathematicians solve routinely.
Currently the context length isn't even long enough for it to remember what problem it was solving. And I've tried to come up with a bunch of ways around this. They all fail for one reason or another. LLMs are really a long, long way off managing this efficiently in my opinion.
I've pasted docs and error messages into GPT 3.5 and it's admitted it's wrong but usually it'll go through a few different answers before returning back to the original and looping
They're good at putting things together commonly found together but not so good at separating concepts back out into more detailed sub pieces.