Hacker News new | past | comments | ask | show | jobs | submit login

It might be that they figure the Internet Archive is doing a great job, and there's no use to do the same job twice. They do contribute, being on of the largest (the largest?) book sponsor.



I suspect that at least part of it is that mirroring copyrighted content is a gray area of law. The fact that the Internet Archive is a non-profit archive may give them some leeway and, in any case, they're a less tempting target than Google would be. Look at the ongoing issues that Google has around news sites for example.


Ideally the Internet Archive stays out of jurisdictions that can force them to remove most types of content. The non-profit aspect won't help them in the least.

Being all over the planet in terms of business, infrastructure and physical presence is where Google acting as archive would fail very badly. They might be the absolute last organization you want serving as that entity.

The IA in theory could operate all of its infrastructure and organization out of a preferential jurisdiction (or a few, so as to have backups in case one favorable location goes bad legally/politically), and archive anything it wants to from around the world while entirely ignoring the local laws from a given place (eg the EU, or China, or Brazil, New Zealand, or Turkey, or wherever).


>The non-profit aspect won't help them in the least. //

AIUI one of the tests for Fair Use [in USA] looks at whether use is non-commercial (not the same as non-profit; commercial use can be free-gratis, for example; nor is it a sufficient condition in itself), so it could be a key element in a court decision I feel; what's more important perhaps is that people are less inclined to sue non-profits because of the potential harm to their own public image.

Google are pretty canny, I'd expect them to let IA lead - eg assumed consent with old books - in order to set a non-binding precedent so that they can go to the press should they be challenged and say "well we just followed what the noble souls of IA are doing, and this court decision will harm the IA".

Last I looked, IIRC, Papua New Guinea wasn't signatory to copyright treaties, but I think they were planning on signing. There's probably a country in a similar circumstances that would be a reasonable place for holding a backup archive that includes the stuff less liberal regimes want you to ditch.


Though the legal situation is a bit murky even in the US. After all, I can't set up a "Comics Archive" and start populating it with all sorts of copyrighted comic strips and expect not to hear from the publishers. But as a non-profit who isn't making money off the content it mirrors, respects robots.txt even retroactively, and will generally honor takedown requests that are remotely legit, it gets cut a lot of slack that a corporation doing this for profit-making purposes wouldn't.


Though the legal situation is a bit murky even in the US.

I don't think it's all that murky.

It's my understanding that IA is allowed to have all that copyrighted stuff because it took the effort to legally register as a real library.


As far as I know, there's no such registry. There are exceptions under Section 108 for institutions that fit a certain definition of library, but from what I can tell as a non-laywer, they don't allow the kind of indiscriminate reproduction that the IA engages on: https://www.law.cornell.edu/uscode/text/17/108


From Wikimopedia: "The Archive is a member of the International Internet Preservation Consortium and was officially designated as a library by the state of California in 2007."

Related newspaper article: http://old.post-gazette.com/pg/07175/796164-96.stm


Copyright is federal law, I don't think the State of California can exempt any institution from it.


The short answer is that libraries do not get a magical exemption to make copies of copyrighted works although they have some limited exemptions (that seem to have mostly been written with physical artifacts in mind). For example, a library cannot rip a DVD and make it available to the public with no usage restrictions.

IANAL but there is maybe an argument to be made that the IA can mirror web sites for preservation purposes but then could only make it available to one researcher at a time.


Ideally, any archive respects the wishes of copyright holders and we don't need to rely on legislation. I certainly want control over my data, and thankfully most legal systems are on my side. My rights over my data trump other peoples need to preserve absolutely everything, no matter how trivial. Like the collection of personal letters my grandparents wanted destroyed after their death, which did not end up in a library vault and the historical significance of which is lost to time.

I pity future historians who will need to wade through the petabytes of crap like so much landfill because we outsourced curating it to the future. Because just maybe the rubbish I spout on my personal blog will be of interest to future generations (hint, it isn't, and I'll be spinning in my grave from embarrassment if it is). I doubt they will wade through it, since we have the ability to leave future generations actual historic records and not force them to learn about us from fragments decoded by archaeologists.


>there's no use to do the same job twice

And why bother looking both ways before crossing the street, or even testing your backups?


Checking left and right are two jobs. And how often do you test each of your backups? It might get corrupted at any moment - is perpetual validation the answer?


Sure, why not? I do regular automated validation of my backups against each other using "rsync --checksum --dry-run" and get notified if anything beyond a tiny threshold is out of whack. (The threshold being due to small files updated between the two backup runs)




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: