Spider Hunter

09 Aug

Unrelenting spiders

I’ve been going through my logs and more then I have in a few years and I am trulystarting to get annoyed with spiders that simply don’t pay attention. Or the ones that will pull data from your servers until they crawl and then pull some more. Right now I’m complaining about Grub and MSNbot. Not only are these two spiders slowing my servers down, but to the best of my abilityI can’t see where the data they are collecting is being used. Sure WiseNut is using some of the data collected from Grub, but who really uses wisenut? I never see any referrers in my logs from there … In any case, for those of you who are writing or maintaining these spider, maybe you should add a few more things into them. Like a governorto throttle back the hits on a web server. I know some spider look at the domain name and try not to not the server too many times, have you ever though of looking at the IP Address as well? I have an Open Directory project directory structure setup on one IP address and, though smoke and mirrors, it services multiple domains. So when you go through and hit my ODP data on different domains it doesn’t matter if you throttle back on one domain if you still access all of the other ones. Just a little more forethought and 10 lines of code would make a few more webmasters just a little happier …

Share and Enjoy: These icons link to social bookmarking sites where readers can share and discover new web pages.
  • Digg
  • del.icio.us
  • Netvouz
  • DZone
  • ThisNext
  • MisterWong
  • Wists

Leave a Reply

You must be logged in to post a comment.

© 2008 Spider Hunter | Entries (RSS) and Comments (RSS)