Wire news services come to mind as an alternative.

Wowfunhappy · on Sept 1, 2023

Perhaps, but the quantity of data is comparatively miniscule.

mejutoco · on Sept 1, 2023

The quantity of information is probably higher though :)

Wowfunhappy · on Sept 1, 2023

Of course, but for training data current LLMs seem to need quantity above all else.

bee_rider · on Sept 1, 2023

I’m sure it depends on the type of prediction task, right?

Current LLMs are trying to predict typical human prose from samples pulled from the internet. So it isn’t as if they are sacrificing quality for quantity. A bunch of text from the internet is a very good representation of typical human prose. Whether it is well written or the descriptions contained in the prose accurately represent, like, actual physical reality is another issue.

Maybe they want to predict something with, like, less dimensionality but more utility than a paragraph of fiction.

hef19898 · on Sept 1, 2023

Agreed, not that anyone really seems to care so.