Hacker News new | past | comments | ask | show | jobs | submit login
Funny entries in last.fm's robots.txt (last.fm)
48 points by someone13 on Aug 16, 2011 | hide | past | favorite | 14 comments



My favorite robots.txt entry is Disallow: /touch-this-and-die

Where that is a script that instantly bans the IP on the server.

You'd be amazed how many bad bots hit it.


But that'd also ban curious non-robot geeks. :(


...who can't follow instructions. :-)


It would be nice to leave a comment for humans in there. I understand why they don't, but yeah.


Have the script clear out the list every other hour if you want to be nice and/or leave a text message on what just happened :-)

Those warning signs about touching the third rail are there for a reason...


I've seen a slightly more sophisticated solution. /phaser/stun bans you for an hour. /phaser/kill bans your for a week.


That is a brilliant idea!

I should use that in all my robots.txt


If you want to take it to the next level, embed it as a microdot (single pixel) link somewhere on a side page, and use css to make it display:none;visibility:hidden;margin:-9999px; so that no human could possibly click on it.

Because of robots.txt, no bot should touch that link. And because it's off the page and hidden, no human either. But you'll be amazed how much it's hit.

But a warning: the problem with today's browsers is prefetching. You'd have to make sure you disallow it for prefetching too or you'll trap innocent humans using hyperactive browsers.

This technique also will catch mass downloading plugins where they are saving your entire website from a browser plugin - but they can die too in my book.

If you want to be nice, write the banlist to a file you clear out ever hour via cron or with a time check.


That one got even picked up by wired, this also contains an interview with one of their developers at the time http://www.wired.com/epicenter/2010/08/robot-laws/


Good catch - didn't even know that. I came across this thanks to a tweet by Mikko Hypponen[1]. It was one of the top three or four.

Fox News's robots.txt[2] is pretty interesting too - they deliberately block at least one article from being indexed[3]

And YouTube[4], as usual, shows some humor in the form of Flight Of The Conchords lyrics.

[1] http://twitter.com/#!/mikkohypponen/status/10344080313274777...

[2] http://www.foxnews.com/robots.txt

[3] http://www.foxnews.com//politics/2011/07/21/wynn-slams-obama...

[4] http://www.youtube.com/robots.txt


Asimov's 3 laws of robotics more like.


I'm a little sad that they're 404s.


Disallow: /harming/humanity/through/inaction


Technically, not listing this is correct. The three robot laws are imprinted upon their brains, and through the magic of sci-fi handwavium, robots can not exist at all without these three laws. The "Zeroth" law, however, was derived by the robots themselves and is not imprinted upon their brains. Indeed, one of their big challenges was to work themselves into a position where they could save humanity even if it meant actively killing an individual human, honoring the implicit law over the explicit one. (This conflict ultimately killed Giskard.) It is more correct not to list this law in a robots.txt than to list it.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: