Hacker News new | past | comments | ask | show | jobs | submit login

A "bad" search engine should treat robots.txt pretty much in reverse: Anything disallowed should go to the top of the list of things to index.. There are sites out there that uses robots.txt rules to prevent Google from indexing things that should be password protected but isn't...



The irony is that robots.txt doesn't even prevent things from being indexed. The files can still be indexed if there's a link to them on the Internet; that's what <meta noindex> is for. (Which, ironically, requires that the page not be robotted, because if it is it can't be crawled, which means the meta tag can't be discovered.)




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: