Hacker News new | past | comments | ask | show | jobs | submit login

We need the same with text to train Large Language Models



That's pretty much what Red Pajama is: https://simonwillison.net/2023/Apr/17/redpajama-data/



It contains captions. That's the alt text. LAION is used for train LLMs.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: