Robots.txt: A Cracker's Best Friend

Feb 20

Robots.txt: A Cracker's Best Friend

Something fairly obvious hit me in the face yesterday: robots.txt files can be a cracker’s best friend.

We knew of someone who had a directory on their site filled with the install files and license keys of all their software so it would be easy to find. In a cursory nod to security, they put a “disallow” rule for this folder in their robots.txt file to ensure it wasn’t indexed. However, in doing this, they simply provided a handy record in a standardized location for anyone who was looking for something they were trying to hide.

How often does this happen, I wonder, and what does your robots.txt file reveal about your site? Yes, you can prevent search engines from indexing something (those that respect the file, anyway), but you’re also announcing to the world that there’s something there you don’t want anyone poking around in. (Remember when the White House tried this?). You may as well put out a “Start Hacking Here” sign.

If you have a secure area on your site, perhaps you’d do better with META tags?

<meta name="robots" content="noindex"/>

Same effect, but the “don’t index me” command is embedded in the page itself, which means you have to find it first.

Perhaps we should all go check our robots.txt files right now to see if there’s anything incriminating in them? Mine’s cool.


Comments

by Craig Fleming,   February 21, 2007 12:37 PM  

Most search engines today (Google for instance) completely ignore this directive anyhow.


by Brian,   February 22, 2007 4:21 AM  

I would say that anyone... "creative" enough to host install files and license keys in the root of their web server without any directory-level security probably deserves to have their stuff messed with. Have they never heard of secure FTP? Integrated authentication? I wouldn't think too many sane people would even consider "protecting" anything meaningful with robots.txt.


by Michael,   February 25, 2007 12:29 PM  

I'm using this as a spammer and bad-robot bait for years. There's a line in the robots.txt of a directory which is not linked from anywhere else. Every IP which accesses that directory can only have found out by looking at the robots.txt first. Works pretty nice.


by putanerd,   August 15, 2007 3:54 PM  

I like the way your think Michael. I will be doing the same.



Comments are Closed

Thanks to all who participated.

Want to advertise on this site? Contact FM.