22 Mar
It is old news that Google uses the sitemap protocol which allows webmasters to tell Google’s spider, Googlebot, which web pages have been updated recently. The greatest advantage is that Google will use the sitemap to make sure they spider your web site in a more intelligent way that you control within the bounds of the sitemap protocol.
The […]
Posted in Spider, SEO by: simpleenigma
No Comments
18 May
I’m a big fan of Jenstar from webmaster world as well as her JenSense web site. For the most part I’m interested in the information about AdSense and other contextual advertising mediums, occasionally she surprises me with information that is usually a bit more technical than I would expect. (Not that Jen isn’t technical, but […]
Posted in Spider, GoogleBot by: simpleenigma
No Comments
16 Mar
Over the years I’ve gone back and forth over which spiders to track more diligently then others and which ones I simply don’t care about any more. For the most part these days the only spider I care about is GoogleBot.
I think that MSNBot is of some consequent, as is Inktomi/Yahoo’s Slurp, but Google is […]
Posted in Spider by: simpleenigma
No Comments
11 Sep
This is a question that I have pondered over a a while, if GoogleBot, or any spider for that matter, looks at the HTTP status codes.
Now we know that at least GoogleBot does.
In a recent post on the Google Site map Blogs they talked about Verifying your site- trouble with 404 pages.
Basically they are trying […]
Posted in Spider, GoogleBot by: simpleenigma
No Comments
22 Aug
The GoogleBot Media bot is the separate spider that is used to evaluate Google AdSense content. Just recently Google added a way for you to tell the Media Bot what areas of your page are more relevant then others.
Section targeting allows you to designate certain areas that should be weighted higher or lower for the […]
Posted in Spider, GoogleBot by: simpleenigma
No Comments
08 Feb
Now that I have some real data on ASNs to work with I decided to see what ASNs were responsible for hiting my robots.txt file the most and I like the results. Here are the top 4: * AS15169 : 630 Hits : Google * AS14776 : 417 Hits : Inktomi * AS3561 : 230 […]
Posted in SpiderHunter, Spider, IP Tracking by: simpleenigma
No Comments
29 Jan
I was just going through my notes on tracking search engine spiders and I thought I’d add in some robots.txt data into the IP database Turns out that in the past 20 months only 4,532 distinct IP addresses have ever looked at a robots.txt on my system. Interesting enough, but it sure does cut down […]
Posted in SpiderHunter, Spider, IP Tracking by: simpleenigma
No Comments
19 Jan
I have been looking for a way for quite some time to figure out the CIDR numbers for an IP Address. Until today I was thinking I was going to have to data mine the IP registrars, but the IP to Country database in the an earlier blog is giving me a way to do […]
Posted in SpiderHunter, Spider, IP Tracking by: simpleenigma
No Comments
06 Jan
I talked about getting TimeZone data a few days ago to figure out where IP addresses are located. I realized something while working with them, I will never see the time zone data for a visitor without JavaScript. This statement alone sounds disheartening until you realize what does not have JavaScript, and that would be […]
Posted in SpiderHunter, Spider, GoogleBot, IP Tracking by: simpleenigma
No Comments
18 Sep
I just went through my IP database, which now stands at 383,472 IP addresses checked :-), looking at the IP addresses that their percentile score is 0.75. This means that the IP Address has never had any cookies. It has never had any referrer data and it has a known good spider user agent. 3 […]
Posted in SpiderHunter, Spider by: simpleenigma
No Comments