New Article on the Spider Checker is up
I finally posted the article on the Spider Checker yeterday afternoon You can find it at http://www.spiderhunter.com/articles/6/
I finally posted the article on the Spider Checker yeterday afternoon You can find it at http://www.spiderhunter.com/articles/6/
I was just thinking about creating a list of user agents that bots are known to use and I had the thought that I might be putting the cart before the horse. To do that I would need to have a list of known bots. This of course would be a much larger list than […]
Sorry about not posting anything for a few days. I’ve been writing code and trying to rewrite some of the base code that runs most everything. Ever tried to rewrite the base code without killing the running applications? It’s fun I swear. I’m hoping to have most of this done in the next few days, […]
Okay, spider food is not a new concept, as many of the things I am talking about here are not new concepts. Hopefully some of these are new to the readers here or I’m adding new twists to an old idea. At least I’m hoping to get more information out for public consumption then has […]
After looking at how the data is collected by the spider checker script I’m thinking of adding a few things to it. First off I want to create a non-spider list. IP Addresses that are simply not spiders. These IP Addresses will be things like aol.com’s caching computers and altavista’s babelfish. (Is that still around?) […]
1,363 known spiders from MSN, Google, Inkomi and Teoma are in the data base right now. Not bad for a weeks worth of work Actually a week wort of scripting, 30 minutes of running the script. This list comprises every spider that I can validate with hits on one of my web servers. I’m […]
So I was tweaking the Spider Checker and Spider Catcher scripts to make them a little more sensitive and (for lack of a better word) intelligent and I noticed something. I have the script setup to add spiders if the percentile of it being a spider is over 0.99, well it started to add things […]
I just wrote the Spider Checker tool for the tools section of this website. It is up and running as of now. I also just added the Spider Checker WSDL feed both here and at www.realtimespiderlist.com
The URLs for accessing these feeds are:
www.spiderhunter.com/spidercheck.cfc?WSDL or
www.realtimespiderlist.com/spidercheck.cfc?WSDL
The only attribute that this feeds takes is a variable called IP, for […]
A few blog entries ago I mentioned that I was keeping a quick IP log that was storing the IP address and DNS name of each computer that visits my sites. This is different from the logs I keep for my webservers, they track each and every hit to my sites. The quick IP log […]
Okay, I finished the code yesterday for the latest version of the spider checker. I’m in the process of converting the code into a WSDL feeds that I will be placing on this site as well as www.realtimespiderlist.com and I’m writing an article on the principles behind the spider checker. Hopefully I’ll have both of […]
© 2008 Spider Hunter | Entries (RSS) and Comments (RSS)