24 Aug
I was just thinking about creating a list of user agents that bots are known to use and I had the thought that I might be putting the cart before the horse. To do that I would need to have a list of known bots. This of course would be a much larger list than […]
Posted in SpiderHunter, Spider by: simpleenigma
No Comments
20 Aug
Okay, spider food is not a new concept, as many of the things I am talking about here are not new concepts. Hopefully some of these are new to the readers here or I’m adding new twists to an old idea. At least I’m hoping to get more information out for public consumption then has […]
Posted in SpiderHunter, Spider, GoogleBot by: simpleenigma
No Comments
19 Aug
After looking at how the data is collected by the spider checker script I’m thinking of adding a few things to it. First off I want to create a non-spider list. IP Addresses that are simply not spiders. These IP Addresses will be things like aol.com’s caching computers and altavista’s babelfish. (Is that still around?) […]
Posted in SpiderHunter, Spider, GoogleBot by: simpleenigma
No Comments
18 Aug
1,363 known spiders from MSN, Google, Inkomi and Teoma are in the data base right now. Not bad for a weeks worth of work Actually a week wort of scripting, 30 minutes of running the script. This list comprises every spider that I can validate with hits on one of my web servers. I’m […]
Posted in SpiderHunter, Spider, GoogleBot by: simpleenigma
No Comments
17 Aug
So I was tweaking the Spider Checker and Spider Catcher scripts to make them a little more sensitive and (for lack of a better word) intelligent and I noticed something. I have the script setup to add spiders if the percentile of it being a spider is over 0.99, well it started to add things […]
Posted in SpiderHunter, Spider by: simpleenigma
No Comments
17 Aug
A few blog entries ago I mentioned that I was keeping a quick IP log that was storing the IP address and DNS name of each computer that visits my sites. This is different from the logs I keep for my webservers, they track each and every hit to my sites. The quick IP log […]
Posted in SpiderHunter, Spider, IP Tracking by: simpleenigma
No Comments
17 Aug
Okay, I finished the code yesterday for the latest version of the spider checker. I’m in the process of converting the code into a WSDL feeds that I will be placing on this site as well as www.realtimespiderlist.com and I’m writing an article on the principles behind the spider checker. Hopefully I’ll have both of […]
Posted in SpiderHunter, Spider by: simpleenigma
No Comments
16 Aug
Now that I have the preliminary Spider Checker Script working pretty well, (I’m only using IP Ranges and user agents and I’m getting pretty good results.) I decided to work on a Spider Catcher. I have the spider checker script being run on each and every IP Address that visits any of my web sites. […]
Posted in SpiderHunter, Spider, IP Tracking by: simpleenigma
No Comments
16 Aug
Okay, so it’s no secret that I love the fact that I have the last 15 or so months worth of data in my SQL server. I have tracked *EVERY* hit to my web servers in a MySQL database that allows me to easily search through the database for spider signatures. (I just need to […]
Posted in SpiderHunter, Spider, IP Tracking by: simpleenigma
No Comments
14 Aug
After I get the spider checker finalized I’m planning on putting up and Spider Checker WSDL feed to give a percentiles of each part of the checker, and weighted average (As the IP check deserves more weight then the User Agent check) along with a educated guess as to which search engine the spider is […]
Posted in SpiderHunter, Spider, IP Tracking by: simpleenigma
No Comments