It's likely that this is the result of training it to avoid bullshitting. It gave confident garbage, and they're trying to stem the flow. This likely leads to less certain responses in complex tasks.
Here is the crux. When asking about non-code stuff, it would confidently lie and that is bad. When asking about code, who cares if the code doesn't work on the first go? You can keep asking to fix it by feeding error messages and it will get there or close enough.
It's obvious what is happening. ChatGPT is going the non-code route and Copilot will go the code route. Microsoft will be able to double charge users.