Hacker News new | past | comments | ask | show | jobs | submit login

How about you respect the robots.txt until the IP address where it is hosted changes. Once the IP has changed, then any new robots.txt exclusions apply only to the new pages not the archived pages under the old IP, which continue respecting the old archived robots.txt.

The IP address changing is a pretty solid indicator that control of that content has moved to a new organisation. Note this does not always coincide with the domain name owner changing.

A scenario that I can imagine becoming litigious: company owns a domain for promoting some product and they use robots.txt to prevent copies. The product reaches end of life and domain is allowed to expire. Someone else buys the domain and starts hosting content with no robots restriction. Archive.org start to display pages from the old company. Company then sues archive.org for copyright violation.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: