That's a good point. I'm not sure how you'd get around non-HTML documents (eg PD...

hdhzy · on April 21, 2017

For PDF you can use X-Robots-Tag HTTP header [0].

Nofollow is a good suggestion of you control links to the resource, robots of you don't.

[0]: https://developers.google.com/webmasters/control-crawl-index...

dchest · on April 21, 2017

Ah, that's true, indeed. The page, though, will appear as a link without any contents, because the bot won't be able to index it.

laumars · on May 2, 2017

Except it has indexed it. It just hasn't crawled it. But content or not, the aim you were trying to achieve (namely your content not being indexed) has failed. Thus you are then once again dependant on other countermeasures that render the robots.txt irrelevant.