Yeah, but right or wrong, sensemaking and long-term coherence are very few bits from the viewpoint of the compression model. You get much larger gains simply by realizing 'e' occurs a lot more often than 'z' and coding accordingly.
It's a fundamental issue with A.I. and has been since the 1960s that we often use metrics that approach an asymptote: maybe between 0.94 and 0.92 bits per character you are finally forced to eke out predictability by never generating "wrong" text, but I don't believe it.
Compression and AI have a lot of overlap, this article makes a potent argument. However there's still more to intelligence than pattern recognition. We might, now, have agents that can say "all cats are grey when its dark." They don't come out with these things on their own, yet.
What do you mean by "on their own"? I would bet that the first human uttering that sentence had several observations of cats in the dark in their training set.
It's a fundamental issue with A.I. and has been since the 1960s that we often use metrics that approach an asymptote: maybe between 0.94 and 0.92 bits per character you are finally forced to eke out predictability by never generating "wrong" text, but I don't believe it.