Hacker News new | past | comments | ask | show | jobs | submit login

Robots.txt is the only way to opt-out of the Wayback Machine.



A serious question: why should you be allowed to 'opt-out' of history? Is this really your call, as a website owner?


Or, with a more historical lens, lots of history has been learned by pouring over intimate private personal correspondences of historical figures - most of whom I would imagine would feel quite perturbed to see their love letters on display in museums.

Should historians not read private letters sent long ago? Should they swear to some oath and take a moral stand that such things shouldn't be examined?

If the answer is "No, they should read them.", then in that same way, then why, for historical record, should we observe robots.txt? Isn't it the same thing?


There's a technical reason - the blocked pages might open infinite URL spaces or bring the site down (crawler hitting /cgi-bin).


That is NOT a technical reason.

Technically speaking a robots.txt that says

User-agent: * Disallow: /

means that you should not crawl the site today. It should have no effect whatsoever on displaying pages that WERE crawled before the timestamp on the robots.txt file.


Actually, if you want to interpret robots.txt that way, it raises the problem of "how long can I consider a robots.txt valid for?"


Those cornercases can exist on sites that don't have a robots.txt and still have to be crawled correctly.


so I take it you're for facebook documenting as much of everyone's lives as possible - for historical reasons?


Probably not.

I don't think there is an inarguable answer to my rhetorical question. People's intents and wishes do matter.

But there also is an idea from antiquity about the public good and the commons. I guess at some point my personal wishes get trumped by this overarching principle.

The whole point of the question was that someone would say "You may not read my love letters" and then society said "Too bad, we're doing it anyway. And reprinting it in highschool text books."

Is that ok? I don't think there's a clear line and I do think there are probably moral boundaries.

I'm by no means Lawrence Lessig and this type of discourse I'm really not experienced at. I do think there are many important questions here that we may need to rethink our thoughts on.


One might nitpick that there was initially some distinction between the publicly available internet and a private facebook; although the latter seems to be making strides to narrow this gap.


Yes, because secrets and forgetting can be important.

It's not our cultural tradition that every written work (train schedules, greeting cards, friendly notes, lolcats,etc.) must be archived at the Library of Congress. I'm not sure that it'd be a good idea.

Archive.org is a good idea.


Since it's your bandwidth and your content, sure.


copyright law says it is.


No one is stopping you from archiving my websites if you think the data will have some importance. It seems like you're suggesting that archive.org is the universal keeper of history and everyone should agree with that idea.


no, you can just mail them.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: