I’m sure it depends on the type of prediction task, right?
Current LLMs are trying to predict typical human prose from samples pulled from the internet. So it isn’t as if they are sacrificing quality for quantity. A bunch of text from the internet is a very good representation of typical human prose. Whether it is well written or the descriptions contained in the prose accurately represent, like, actual physical reality is another issue.
Maybe they want to predict something with, like, less dimensionality but more utility than a paragraph of fiction.