Enabling historical revisionism: Very interesting note from the Democratic National Committee:
Sometime between April 2003 and October 2003, someone at the White House added virtually all of the directories with ‘Iraq’ in them to its robots.txt file, meaning that search engines would no longer list those pages in results or archive them.
The offending robots.txt file is still out there for your viewing pleasure. The quote above isn’t quite correct — pages already in a search index are not affected by a change in a robots.txt file until the spider goes back to reindex them, at which time they should be removed from the index.
Something fairly obvious hit me in the face yesterday: robots.txt files can be a cracker's best friend. We knew of someone who had a directory on their site filled with the install files and license keys of all their software so it would be easy to find. In a cursory…
Cryptome points us to this link: the robots.txt file at the FBI's site. I'll save you the click: User-agent: *Disallow: / This isn't the first time the government has shown some nervousness about their Web site content: The White House Vs. Search Spiders.