Hacker News new | past | comments | ask | show | jobs | submit login

Care to make it publicly available? Or is that not permitted on your dataset? Certainly, there’s a lot more PDFs out there than 8TB. I bet there’s a lot of redundancy in yours, but doesn’t dedup well because of all the images.



I have >10TB of magazines I've collected so far, and I could probably source another 50TB if I had the time. I'm working on uploading them, but I've had too much on my plate lately: https://en.magazedia.wiki/

There is a significant issue with copyright, though. I'll remove anything with a valid DMCA, but 99.9% of the world's historical magazine issues are now in IP limbo as their ownership is probably unknown. Most of the other .1% aren't overly concerned as distribution is their goal and their main income is advertising, not sales.


I think that would be legally iffy for the stuff like collections of old magazines that were purchased on CD/DVD and such :/




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: