Feb 20

Robots.txt: A Cracker's Best Friend

Something fairly obvious hit me in the face yesterday: robots.txt files can be a cracker’s best friend.

We knew of someone who had a directory on their site filled with the install files and license keys of all their software so it would be easy to find. In a cursory nod to security, they put a “disallow” rule for this folder in their robots.txt file to ensure it wasn’t indexed. However, in doing this, they simply provided a handy record in a standardized location for anyone who was looking for something they were trying to hide.

How often does this happen, I wonder, and what does your robots.txt file reveal about your site? Yes, you can prevent search engines from indexing something (those that respect the file, anyway), but you’re also announcing to the world that there’s something there you don’t want anyone poking around in. (Remember when the White House tried this?). You may as well put out a “Start Hacking Here” sign.

If you have a secure area on your site, perhaps you’d do better with META tags?

<meta name="robots" content="noindex"/>

Same effect, but the “don’t index me” command is embedded in the page itself, which means you have to find it first.

Perhaps we should all go check our robots.txt files right now to see if there’s anything incriminating in them? Mine’s cool.


Comments

by Craig Fleming,   February 21, 2007 12:37 PM  

Most search engines today (Google for instance) completely ignore this directive anyhow.


by Brian,   February 22, 2007 4:21 AM  

I would say that anyone... "creative" enough to host install files and license keys in the root of their web server without any directory-level security probably deserves to have their stuff messed with. Have they never heard of secure FTP? Integrated authentication? I wouldn't think too many sane people would even consider "protecting" anything meaningful with robots.txt.


by Michael,   February 25, 2007 12:29 PM  

I'm using this as a spammer and bad-robot bait for years. There's a line in the robots.txt of a directory which is not linked from anywhere else. Every IP which accesses that directory can only have found out by looking at the robots.txt first. Works pretty nice.


by putanerd,   August 15, 2007 3:54 PM  

I like the way your think Michael. I will be doing the same.



Add Comment


Want to advertise on this site? Contact FM.
Laser Toner Cartridges UK laser toner, toner cartridges, hp toner, lexmark toner, samsung toner, canon, toner, epson toner, oki toner, kyocera toner, xerox toner, remanufactured toner, compatible toner
Direct TV Deals Free 4 room direct tv deals. no equipment to buy. free fast professional direct tv installation. this is the best direct tv deal available anywhere.
SEO Article Learn from the experts with our SEO article.
rope light Shopping with birddog distributing, inc., gives you access to the lowest prices, the best customer service and the quickest delivery times possible.
Laptop AC Adapter We offer genuine factory direct replacement AC adapters.
Direct TV Best satellite TV deals.
Direct TV Deals Direct TV programming deals are varied and include packages containing from 50 channels up to over 250 channels.
8mm film to DVD Retain family memories with the only frame by frame digital restoration service in the United States for your 8mm film to DVD today
Rubber Stamp Shop for custom self-inking stamps, hand stamps, address stamps, label stamps, check endorsement stamps, check deposit stamps, date stamps, pre inks, pocket stamps, ink and much more!