Hacker News new | past | comments | ask | show | jobs | submit login

Have you considered working with the Internet Archive on this across their corpus? They are open to such work being done. And if some of the material you need isn’t in the archive, let’s get it in there.

I have not but I am going to file the idea, it would indeed be a good starting point.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact
