Spider Hunter

Archive for the 'Spider' Category

24 Aug

Developing a bot list

I was just thinking about creating a list of user agents that bots are known to use and I had the thought that I might be putting the cart before the horse. To do that I would need to have a list of known bots. This of course would be a much larger list than […]

20 Aug

Spiderfood for thought

Okay, spider food is not a new concept, as many of the things I am talking about here are not new concepts. Hopefully some of these are new to the readers here or I’m adding new twists to an old idea. At least I’m hoping to get more information out for public consumption then has […]

19 Aug

Future additions to the spider checker script

After looking at how the data is collected by the spider checker script I’m thinking of adding a few things to it. First off I want to create a non-spider list. IP Addresses that are simply not spiders. These IP Addresses will be things like aol.com’s caching computers and altavista’s babelfish. (Is that still around?) […]

18 Aug

Current Status

1,363 known spiders from MSN, Google, Inkomi and Teoma are in the data base right now. Not bad for a weeks worth of work Actually a week wort of scripting, 30 minutes of running the script. This list comprises every spider that I can validate with hits on one of my web servers. I’m […]

17 Aug

I love it when a plan comes together

So I was tweaking the Spider Checker and Spider Catcher scripts to make them a little more sensitive and (for lack of a better word) intelligent and I noticed something. I have the script setup to add spiders if the percentile of it being a spider is over 0.99, well it started to add things […]

17 Aug

Quick IP Log

A few blog entries ago I mentioned that I was keeping a quick IP log that was storing the IP address and DNS name of each computer that visits my sites. This is different from the logs I keep for my webservers, they track each and every hit to my sites. The quick IP log […]

17 Aug

Spider Checker Finished

Okay, I finished the code yesterday for the latest version of the spider checker. I’m in the process of converting the code into a WSDL feeds that I will be placing on this site as well as www.realtimespiderlist.com and I’m writing an article on the principles behind the spider checker. Hopefully I’ll have both of […]

16 Aug

Spider Catcher

Now that I have the preliminary Spider Checker Script working pretty well, (I’m only using IP Ranges and user agents and I’m getting pretty good results.) I decided to work on a Spider Catcher. I have the spider checker script being run on each and every IP Address that visits any of my web sites. […]

16 Aug

Looking at the IP near know spiders

Okay, so it’s no secret that I love the fact that I have the last 15 or so months worth of data in my SQL server. I have tracked *EVERY* hit to my web servers in a MySQL database that allows me to easily search through the database for spider signatures. (I just need to […]

14 Aug

Spider Checker WSDL Feed

After I get the spider checker finalized I’m planning on putting up and Spider Checker WSDL feed to give a percentiles of each part of the checker, and weighted average (As the IP check deserves more weight then the User Agent check) along with a educated guess as to which search engine the spider is […]

© 2008 Spider Hunter | Entries (RSS) and Comments (RSS)