16 Aug
Now that I have the preliminary Spider Checker Script working pretty well, (I’m only using IP Ranges and user agents and I’m getting pretty good results.) I decided to work on a Spider Catcher. I have the spider checker script being run on each and every IP Address that visits any of my web sites. […]
Posted in SpiderHunter, Spider, IP Tracking by: simpleenigma
No Comments
16 Aug
Okay, so it’s no secret that I love the fact that I have the last 15 or so months worth of data in my SQL server. I have tracked *EVERY* hit to my web servers in a MySQL database that allows me to easily search through the database for spider signatures. (I just need to […]
Posted in SpiderHunter, Spider, IP Tracking by: simpleenigma
No Comments
14 Aug
After I get the spider checker finalized I’m planning on putting up and Spider Checker WSDL feed to give a percentiles of each part of the checker, and weighted average (As the IP check deserves more weight then the User Agent check) along with a educated guess as to which search engine the spider is […]
Posted in SpiderHunter, Spider, IP Tracking by: simpleenigma
No Comments
14 Aug
Now that I have a few parts of the spider checker working, even though I’ve only talked about one part so far, I’m implementing a spider catch. The idea here is to look at every IP Address in real time and evaluate it to see if it is a spider. Anything that gets a spider […]
Posted in SpiderHunter, Spider, IP Tracking by: simpleenigma
No Comments
14 Aug
I’m not sure exactly how many parts of this Spider Checker there are going to be, so we’ll all see as time going on. (Note: A lot of people are coming here looking for their own IP Address. For the record, your IP Address is “remote_addr”)
The basic principle is that there are a few things […]
Posted in SpiderHunter, Spider, IP Tracking by: simpleenigma
No Comments
11 Aug
I’m currently working on a new spider check tool. Some of the old timers around here may remember my old spider checker. It was a tool designed to let you enter an IP address and and optional user agent and then it would give you the percentage chance that the given IP was a spider. […]
Posted in SpiderHunter, Spider, IP Tracking by: simpleenigma
No Comments
09 Aug
I’ve been going through my logs and more then I have in a few years and I am trulystarting to get annoyed with spiders that simply don’t pay attention. Or the ones that will pull data from your servers until they crawl and then pull some more. Right now I’m complaining about Grub and MSNbot. […]
Posted in SpiderHunter, Spider, IP Tracking by: simpleenigma
No Comments
08 Aug
I’ve done this one before, so this is more of an update. Often times the ethics of cloaking are brought up in conversations about cloaking. The bottom line is that cloaking is a tool and like any other tool the real issue is how you use that tool. If you are using cloaking to display […]
Posted in SpiderHunter, Cloaking by: simpleenigma
No Comments
06 Aug
Have you ever looked through your logsĀ and found out which IP Addresses visit your sites the most? You guessed it, most of the time they are spiders or bots of some kind. In my case it tends to be Teoma. This tends to work well for finding spiders that come from a limited number of […]
Posted in SpiderHunter, Spider, GoogleBot, IP Tracking by: simpleenigma
No Comments
05 Aug
Google likes naming their spiders and giving the same name to a whole cluster of spiders. One thing that you can do with this is look up all the IP Addresses with the same name and evaluate those IPs as well. Here is the basic process:
Find one IP Address that the reverse DNS lookup resolves […]
Posted in SpiderHunter, Spider, GoogleBot by: simpleenigma
No Comments