Virtual Robots.txt
Most people who have been around the web for a while know that a robots.txt files will tell a spider where to go and where not to go. Any well behaved spider will look at the /robots.txt file at the root of a web site to see if it has permission to visit the site and if there are any areas they should not go. (notice the key phrase ‘any well behaved spider’) Anyway, I’m not going to go into the formatting of the robots.txt file. I will someday, but until then there are plenty of resources for that already. My proposal is an extension of the 404 Trap, a Virtual robots.txt. Like the 404 Trap, which is in yesterday’s blog, the virtual robots.txt will not really exist. It would be dynamically generated by a 404 Trap allowing you to secure your robots.txt. This would require having an up-to-date database of spider IP address, like the one I am working on at www.realtimespiderlist.com. (Shameless plug, I know
) Anyway, the idea would be that you could show a generic robots.txt to any IP Address that is not on the spider list and then you could show a specific robots.txt file for each individual spider. There is no reason that any regular user should ever be looking at your robots.txt file, anyone who is looking at this file is running software to download your site to their hard drive or they are trying to hack your site.






