> scrapers slow down their site and that's why they use robots.txt a poorly writ...

eknkc · on April 21, 2017

I don't think a poorly written scraper would follow robots.txt rules according to spec. So, in any case the site should have other measures (rate limiting?) anyway.

davb · on April 21, 2017

Additionally, if excessive scraping became an issue for my site I'd consider rate limiting client.

dingo_bat · on April 21, 2017

> (specified by the website owner via a robots.txt like spec).

Nope, if a website wants such a restriction, it must enforce it. Robots.txt is a request. It's worthless.

bkor · on April 21, 2017

If a robot misbehaves, it'll either be blocked or it'll go to the networks abuse section and that bot will be taken down. That a site possibly could have some kind of technical solution to this doesn't matter.

problems · on April 21, 2017

Precisely - the solution here needs to be that the server blocks the robot - if it can differentiate it from other traffic that is. That's all well and good and that's the solution which should be used here. If you don't want to be archived, block the IP.

timClicks · on April 21, 2017

The Crawl-delay directive is the de facto standard for this.