Spider Hunter

Archive for August, 2004

16 Aug

Spider Catcher

Now that I have the preliminary Spider Checker Script working pretty well, (I’m only using IP Ranges and user agents and I’m getting pretty good results.) I decided to work on a Spider Catcher. I have the spider checker script being run on each and every IP Address that visits any of my web sites. […]

16 Aug

Looking at the IP near know spiders

Okay, so it’s no secret that I love the fact that I have the last 15 or so months worth of data in my SQL server. I have tracked *EVERY* hit to my web servers in a MySQL database that allows me to easily search through the database for spider signatures. (I just need to […]

14 Aug

Spider Checker WSDL Feed

After I get the spider checker finalized I’m planning on putting up and Spider Checker WSDL feed to give a percentiles of each part of the checker, and weighted average (As the IP check deserves more weight then the User Agent check) along with a educated guess as to which search engine the spider is […]

14 Aug

Spider Catch

Now that I have a few parts of the spider checker working, even though I’ve only talked about one part so far, I’m implementing a spider catch. The idea here is to look at every IP Address in real time and evaluate it to see if it is a spider. Anything that gets a spider […]

14 Aug

Spider Checker - Part 1 - IP Checker

I’m not sure exactly how many parts of this Spider Checker there are going to be, so we’ll all see as time going on. (Note: A lot of people are coming here looking for their own IP Address. For the record, your IP Address is “remote_addr”)
The basic principle is that there are a few things […]

11 Aug

Spider Checker

I’m currently working on a new spider check tool. Some of the old timers around here may remember my old spider checker. It was a tool designed to let you enter an IP address and and optional user agent and then it would give you the percentage chance that the given IP was a spider. […]

09 Aug

Unrelenting spiders

I’ve been going through my logs and more then I have in a few years and I am trulystarting to get annoyed with spiders that simply don’t pay attention. Or the ones that will pull data from your servers until they crawl and then pull some more. Right now I’m complaining about Grub and MSNbot. […]

08 Aug

Ethics of cloaking

I’ve done this one before, so this is more of an update. Often times the ethics of cloaking are brought up in conversations about cloaking. The bottom line is that cloaking is a tool and like any other tool the real issue is how you use that tool. If you are using cloaking to display […]

06 Aug

IP Addresses with the most vists

Have you ever looked through your logsĀ and found out which IP Addresses visit your sites the most? You guessed it, most of the time they are spiders or bots of some kind. In my case it tends to be Teoma. This tends to work well for finding spiders that come from a limited number of […]

05 Aug

How many spiders with the same name?

Google likes naming their spiders and giving the same name to a whole cluster of spiders. One thing that you can do with this is look up all the IP Addresses with the same name and evaluate those IPs as well. Here is the basic process:
Find one IP Address that the reverse DNS lookup resolves […]

© 2008 Spider Hunter | Entries (RSS) and Comments (RSS)