Subjectively, it doesn’t feel AI generated, it feels like a human wrote it.
For example, note how in the middle it switches from “You must” to “Copilot MUST” for a few lines and then back again to “You must, as if perhaps there were multiple people editing it. That kind of inconsistency seems human.
I don’t think the switch from “you” to “Copilot” is a hallucination OR a mistake. I think this shows that there are two systems being used together: the original Copilot model, and the chat agent. The chat agent is being given instructions for its own behavior, but I suspect it is also evaluating and incorporating the output from the Copilot model (also stuffed into the prompt).
It’s possible but I feel that if an LLM flips styles, it will stick to that style afterwards. And the more advanced LLMs (I could be wrong but iirc Copilot chat is supposed to be GPT-4?) are much less likely to flip styles in the middle. Bigger models tend to be more coherent.
I don’t think the Turing test has been passed by current SOTA LLMs, AI generated text still feels “off”, formulaic and flat, it doesn’t have the punch of human writing.
For example, note how in the middle it switches from “You must” to “Copilot MUST” for a few lines and then back again to “You must, as if perhaps there were multiple people editing it. That kind of inconsistency seems human.