Hacker News new | past | comments | ask | show | jobs | submit login

I wonder if anyone is working on an archive?



If anyone is it would probably be the people over at ArchiveTeam / URLTeam: https://wiki.archiveteam.org/ https://wiki.archiveteam.org/index.php/URLTeam


It looks like they're active, but not working on git.io URLs: https://tracker.archiveteam.org:1338/status

I have no idea how to submit shortened URLs to URLTeam, but if anyone can find out I'll be happy to scrape URLs out of that list of Google Scholar PDFs.


It is unlikely because it is very difficult and maybe illegal to scrape research papers. See https://en.wikipedia.org/wiki/Aaron_Swartz for example.


I think they meant an archive of git.io links.


I mean that it is hard to scrape these git.io links used in the research papers to build the archive. Unless of course, if Github provides a DB dump, it would help everyone a lot.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: