Hacker News new | past | comments | ask | show | jobs | submit login

Wire news services come to mind as an alternative.



Perhaps, but the quantity of data is comparatively miniscule.


The quantity of information is probably higher though :)


Of course, but for training data current LLMs seem to need quantity above all else.


I’m sure it depends on the type of prediction task, right?

Current LLMs are trying to predict typical human prose from samples pulled from the internet. So it isn’t as if they are sacrificing quality for quantity. A bunch of text from the internet is a very good representation of typical human prose. Whether it is well written or the descriptions contained in the prose accurately represent, like, actual physical reality is another issue.

Maybe they want to predict something with, like, less dimensionality but more utility than a paragraph of fiction.


Agreed, not that anyone really seems to care so.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: