<?xml version="1.0" encoding="UTF-8"?>
<!-- generator="wordpress/2.1" -->
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	>

<channel>
	<title>Spider Hunter</title>
	<link>http://spiderhunter.com</link>
	<description>IP Address Tracking, IP Address database and Search Engine Spider information</description>
	<pubDate>Mon, 24 Dec 2007 04:40:41 +0000</pubDate>
	<generator>http://wordpress.org/?v=2.1</generator>
	<language>en</language>
			<item>
		<title>Current Project</title>
		<link>http://spiderhunter.com/spiderl/131/current-project/</link>
		<comments>http://spiderhunter.com/spiderl/131/current-project/#comments</comments>
		<pubDate>Mon, 24 Dec 2007 04:40:41 +0000</pubDate>
		<dc:creator>simpleenigma</dc:creator>
		
		<category><![CDATA[SpidErl]]></category>

		<category><![CDATA[IMAP]]></category>

		<guid isPermaLink="false">http://spiderhunter.com/spiderl/131/current-project/</guid>
		<description><![CDATA[I know I haven&#8217;t posted to this site in a very long time, but I have been working on some email related projects for quite some time.
I&#8217;ve been writing an open source email (SMTP,IMAP and POP) server in Erlang called ErlMail at http://erlsoft.org which has been happily taking up all of my spare time.
http://erlsoft.org is the [...]]]></description>
			<content:encoded><![CDATA[<p>I know I haven&#8217;t posted to this site in a very long time, but I have been working on some email related projects for quite some time.</p>
<p>I&#8217;ve been writing an open source email (SMTP,IMAP and POP) server in Erlang called ErlMail at <a href="http://erlsoft.org/" onclick="javascript:urchinTracker ('/outbound/article/erlsoft.org');">http://erlsoft.org</a> which has been happily taking up all of my spare time.</p>
<p><a href="http://erlsoft.org/" onclick="javascript:urchinTracker ('/outbound/article/erlsoft.org');">http://erlsoft.org</a> is the main website for the ErlMail project as well as a DNS server, VoIP server and a web server with it&#8217;s own markup language.</p>
<p>All of the projects are on Google Code now as well at the following URLs:</p>
<ul>
<li><a href="http://erlmail.googlecode.com/" onclick="javascript:urchinTracker ('/outbound/article/erlmail.googlecode.com');">http://erlmail.googlecode.com</a> - Email Server</li>
<li><a href="http://erlweb.googlecode.com/" onclick="javascript:urchinTracker ('/outbound/article/erlweb.googlecode.com');">http://erlweb.googlecode.com</a> - Web Server</li>
<li><a href="http://erldns.googlecode.com/" onclick="javascript:urchinTracker ('/outbound/article/erldns.googlecode.com');">http://erldns.googlecode.com</a> - DNS server</li>
<li><a href="http://erlvoip.googlecode.com/" onclick="javascript:urchinTracker ('/outbound/article/erlvoip.googlecode.com');">http://erlvoip.googlecode.com</a> - VoIP server</li>
<li><a href="http://eif.googlecode.com/" onclick="javascript:urchinTracker ('/outbound/article/eif.googlecode.com');">http://eif.googlecode.com</a> - Erlang Internet Framework - updates and manges the other projects</li>
<li><a href="http://erlyvideo.googlecode.com/" onclick="javascript:urchinTracker ('/outbound/article/erlyvideo.googlecode.com');">http://erlyvideo.googlecode.com</a> - Flash Media Server</li>
</ul>
<p>Well, the lets you know where I&#8217;m at and what I&#8217;ve been up to for the past few months.</p>
]]></content:encoded>
			<wfw:commentRss>http://spiderhunter.com/spiderl/131/current-project/feed/</wfw:commentRss>
		</item>
		<item>
		<title>Sitemap Protocol for Yahoo</title>
		<link>http://spiderhunter.com/spider/129/sitemap-protocol-for-yahoo/</link>
		<comments>http://spiderhunter.com/spider/129/sitemap-protocol-for-yahoo/#comments</comments>
		<pubDate>Thu, 22 Mar 2007 17:06:02 +0000</pubDate>
		<dc:creator>simpleenigma</dc:creator>
		
		<category><![CDATA[Spider]]></category>

		<category><![CDATA[SEO]]></category>

		<guid isPermaLink="false">http://spiderhunter.com/spider/129/sitemap-protocol-for-yahoo/</guid>
		<description><![CDATA[It is old news that Google uses the sitemap protocol which allows webmasters to tell Google&#8217;s&#160;spider,&#160;Googlebot, which web pages have been updated&#160;recently. The greatest advantage is that Google will use the sitemap to make sure they spider your web site in a more intelligent way that you control within the bounds of the sitemap protocol.
The [...]]]></description>
			<content:encoded><![CDATA[<p>It is old news that Google uses the <a href="http://www.google.com/webmasters/sitemaps/" onclick="javascript:urchinTracker ('/outbound/article/www.google.com');">sitemap</a> protocol which allows webmasters to tell Google&#8217;s&nbsp;spider,&nbsp;Googlebot, which web pages have been updated&nbsp;recently. The greatest advantage is that Google will use the sitemap to make sure they spider your web site in a more intelligent way that you control within the bounds of the sitemap protocol.</p>
<p>The sitemap protocol has grown beyond just Google these days. You can find out more information about the protocol itself at <a href="http://www.sitemaps.org/" onclick="javascript:urchinTracker ('/outbound/article/www.sitemaps.org');">sitemaps.org</a>. They give a good overview of the protocol, which is pretty simple, and in the FAQ they imply that other search engines are using the sitemaps protocol.</p>
<p>That fact turns out to be true. For some time now Yahoo has been using the sitemaps protocol as well. On their <a href="https://siteexplorer.search.yahoo.com/" onclick="javascript:urchinTracker ('/outbound/article/siteexplorer.search.yahoo.com');">Site Explorer</a> web site you can let them know about your sitemaps in much the same way that&nbsp;you would inform Google and they also take feeds, such as RSS feeds, in addition to the sitemap protocol.</p>
<p>MSN is also suppose to be using the sitemap protocol, but I haven&#8217;t found the page to submit any sitemaps to.</p>
<p>At this point the only thing that I can think of that would make this better is if there was a way to for auto discovery of sitemaps. Something similar to the auto discovery options for RSS feeds, a META tag that point to one or more sitemaps on your domain.&nbsp;</p>
<p>With this type of auto discovery method any search engine could use the sitemap protocol and have an intelligent way to spider your site without having to login to all of the different services that use sitemaps.&nbsp;</p>
<p>sitemaps, Google, Yahoo</p>
<p>Technorati Tags: <a href="http://technorati.com/tag/sitemaps" rel="tag" onclick="javascript:urchinTracker ('/outbound/article/technorati.com');">sitemaps</a>, <a href="http://technorati.com/tag/Google" rel="tag" onclick="javascript:urchinTracker ('/outbound/article/technorati.com');">Google</a>, <a href="http://technorati.com/tag/Yahoo" rel="tag" onclick="javascript:urchinTracker ('/outbound/article/technorati.com');">Yahoo</a></p>]]></content:encoded>
			<wfw:commentRss>http://spiderhunter.com/spider/129/sitemap-protocol-for-yahoo/feed/</wfw:commentRss>
		</item>
		<item>
		<title>Yet another site redesign</title>
		<link>http://spiderhunter.com/spiderhunter/122/yet-another-site-redesign/</link>
		<comments>http://spiderhunter.com/spiderhunter/122/yet-another-site-redesign/#comments</comments>
		<pubDate>Fri, 23 Feb 2007 17:09:22 +0000</pubDate>
		<dc:creator>simpleenigma</dc:creator>
		
		<category><![CDATA[SpiderHunter]]></category>

		<guid isPermaLink="false">http://spiderhunter.com/blogs/2007-02-23/122/</guid>
		<description><![CDATA[For the past several weeks I&#8217;ve been redesigning the site and I recently redirected the RealTimeSpiderList.com to point back here to SpiderHunter.com.
I&#8217;ve been moving my servers around and I am no longer hosting the servers at my home. I eventually plan to host these servers on my own hardware sometime off in the future, but [...]]]></description>
			<content:encoded><![CDATA[<p>For the past several weeks I&#8217;ve been redesigning the site and I recently redirected the RealTimeSpiderList.com to point back here to SpiderHunter.com.</p>
<p>I&#8217;ve been moving my servers around and I am no longer hosting the servers at my home. I eventually plan to host these servers on my own hardware sometime off in the future, but my own personal plans making hosting my own server nearly impossible at the moment.</p>
<p>So I&#8217;ve recreated the site as best I can on a WordPress blogs. I&#8217;m still reworking some of the content to make sure the URL are as similar as possible, but some of the content will be lost in the transition.</p>
<p>If you notice something that you want back, email me and I&#8217;ll re-post the content in blog form instead of articles or whatever format they were in.</p>
<p>Spider Hunter</p>
<p>Technorati Tags: <a href="http://technorati.com/tag/Spider+Hunter" rel="tag" onclick="javascript:urchinTracker ('/outbound/article/technorati.com');">Spider Hunter</a></p>]]></content:encoded>
			<wfw:commentRss>http://spiderhunter.com/spiderhunter/122/yet-another-site-redesign/feed/</wfw:commentRss>
		</item>
		<item>
		<title>IP Location Book</title>
		<link>http://spiderhunter.com/ip-tracking/121/ip-location-book/</link>
		<comments>http://spiderhunter.com/ip-tracking/121/ip-location-book/#comments</comments>
		<pubDate>Sun, 07 Jan 2007 19:03:00 +0000</pubDate>
		<dc:creator>simpleenigma</dc:creator>
		
		<category><![CDATA[IP Tracking]]></category>

		<guid isPermaLink="false">http://spiderhunter.com/blogs/2007-01-07/120/</guid>
		<description><![CDATA[I was wandering through my local book store and I happen to come across a book with an interesting title IP Location. It turns out to be a book on geolocation from several different points of view. I flipped through it and I&#8217;m planning on purchasing it soon, I also put it up on my [...]]]></description>
			<content:encoded><![CDATA[<p>I was wandering through my local book store and I happen to come across a book with an interesting title <a href="http://spiderhunter.com/amazon/0072263776/IP_Location.html" >IP Location</a>. It turns out to be a book on geolocation from several different points of view. I flipped through it and I&#8217;m planning on purchasing it soon, I also put it up on my <a href="http://www.amazon.com/gp/registry/registry.html/103-8493593-6929444?ie=UTF8&amp;type=wishlist&amp;id=2MKJ5ITRWRJOE" onclick="javascript:urchinTracker ('/outbound/article/www.amazon.com');">amazon.com wish-list</a> in case anyone wants to make my week.</p>
]]></content:encoded>
			<wfw:commentRss>http://spiderhunter.com/ip-tracking/121/ip-location-book/feed/</wfw:commentRss>
		</item>
		<item>
		<title>Netflix and SpidErl</title>
		<link>http://spiderhunter.com/spiderl/120/netflix-and-spiderl/</link>
		<comments>http://spiderhunter.com/spiderl/120/netflix-and-spiderl/#comments</comments>
		<pubDate>Sun, 05 Nov 2006 01:42:00 +0000</pubDate>
		<dc:creator>simpleenigma</dc:creator>
		
		<category><![CDATA[SpidErl]]></category>

		<guid isPermaLink="false">http://spiderhunter.com/blogs/2006-11-04/119/</guid>
		<description><![CDATA[For the past month I have been working on the Netflix Prize, trying to create some collaborative filters in pure Erlang. While I am no where near a winning score, at the moment, I have learn a lot about how I want to create my own websites in a pure Erlang environment. One of the [...]]]></description>
			<content:encoded><![CDATA[<p>For the past month I have been working on the <a href="http://www.netflixprize.com" onclick="javascript:urchinTracker ('/outbound/article/www.netflixprize.com');">Netflix Prize</a>, trying to create some collaborative filters in pure <a href="http://www.erlang.org" onclick="javascript:urchinTracker ('/outbound/article/www.erlang.org');">Erlang</a>. While I am no where near a winning score, at the moment, I have learn a lot about how I want to create my own websites in a pure Erlang environment. One of the more interesting things is going to be the fact that I will be needing my own spider, and in the standard Erlang way I have named it SpidErl. Erlang people like putting Erl into the names they give things.</p>
<p>SpidErl will mostly just fetch things in the first few versions, I&#8217;m going to be collecting RSS feeds and some Amazon affiliate data for the most part. I think I&#8217;m going to write a module that collects AdSense data as well, obviously you need a user name and password to get that data. In the long run I hope to create a full fledged Search Engine around SpidErl&#8217;s information store, but that will be month or years from now.</p>
<p>In the mean time look for some notes about building my own spider and a site redesign taking advantage of the information that the spider collects <img src='http://spiderhunter.com/wp-includes/images/smilies/icon_smile.gif' alt=':-)' class='wp-smiley' /></p>
]]></content:encoded>
			<wfw:commentRss>http://spiderhunter.com/spiderl/120/netflix-and-spiderl/feed/</wfw:commentRss>
		</item>
		<item>
		<title>Guess who&#8217;s cloaking now</title>
		<link>http://spiderhunter.com/cloaking/119/guess-whos-cloaking-now/</link>
		<comments>http://spiderhunter.com/cloaking/119/guess-whos-cloaking-now/#comments</comments>
		<pubDate>Tue, 20 Jun 2006 04:24:00 +0000</pubDate>
		<dc:creator>simpleenigma</dc:creator>
		
		<category><![CDATA[Cloaking]]></category>

		<guid isPermaLink="false">http://spiderhunter.com/blogs/2006-06-19/118/</guid>
		<description><![CDATA[In the most interesting Google and cloaking related story I&#8217;ve found lately, as you might notice from the lack of blogging, it seems that the New York Times is using cloaking technology on their web sites. I&#8217;m not sure who actually broke this story, but Danny Sullivan from search engine watch is quoted as saying [...]]]></description>
			<content:encoded><![CDATA[<p>In the most interesting Google and cloaking related story I&#8217;ve found lately, as you might notice from the lack of blogging, it seems that the New York Times is using cloaking technology on their web sites. I&#8217;m not sure who actually broke this story, but Danny Sullivan from <a href="http://www.searchenginewatch.com" onclick="javascript:urchinTracker ('/outbound/article/www.searchenginewatch.com');">search engine watch</a> is quoted as saying that what the New York Times is doing looks like cloaking in his opinion.</p>
<p>What in fact is happening is that Googlebot is able to see the contents of the paid area for the New York Times without actually paying. This is shown by the fact that when someone searches for content on the new York times web site the information displayed in the Google pages is different than what a non subscriber would get if they went to the New York Times website.</p>
<p>This has always been one of the arguments I have seen in favor of cloaking. The ability to have your content indexed by certain IP addresses for search engines spiders, while not allowing people to see the content without at least some sort of registration. In my mind this is 100% in cloaking technology, but this is the type of cloaking that should be allowed. Keeping that in mind, the Google terms of service does not allow cloaking of any kind. So while I completely support this use of cloaking technology it seems only fair that Google would have to ban all of the web pages from the New York Times website. This is an obvious violation of terms of service and if Google makes an exception for a predominant web site such as the New York Times they leave themselves open two potential legal actions from site owners who have been banned under this policy.</p>
<p>Personally I don&#8217;t like either of those options. My personal preference would be to see the Google terms of service modified in such a way that cloaking technologies were allowed so long as certain criteria are met, with the intention behind those criteria to be displaying the same or extremely similar content to both the search engines spider and the end user. Of course that opens up another completely different set of problems where Google would end up having to view the contents of both a cloaked and uncloaked pages and then make their decision on whether or not the cloaking technology was the appropriately. That gets us into a whole subjective realm of what constitutes similar content.</p>
<p>So the bottom line here is that it would appear that Google is damned if they do anything and if they don&#8217;t. The chances of getting the New York Times to stop using the cloaking technology would appear to be slim at best and now that this is public knowledge if Google does nothing they may see other consequences.</p>
<p><a href="http://blog.outer-court.com/archive/2006-06-19-n18.html" onclick="javascript:urchinTracker ('/outbound/article/blog.outer-court.com');">http://blog.outer-court.com/archive/2006-06-19-n18.html</a></p>
]]></content:encoded>
			<wfw:commentRss>http://spiderhunter.com/cloaking/119/guess-whos-cloaking-now/feed/</wfw:commentRss>
		</item>
		<item>
		<title>Different GoogleBots doing the same work</title>
		<link>http://spiderhunter.com/spider/118/different-googlebots-doing-the-same-work/</link>
		<comments>http://spiderhunter.com/spider/118/different-googlebots-doing-the-same-work/#comments</comments>
		<pubDate>Fri, 19 May 2006 05:45:00 +0000</pubDate>
		<dc:creator>simpleenigma</dc:creator>
		
		<category><![CDATA[Spider]]></category>

		<category><![CDATA[GoogleBot]]></category>

		<guid isPermaLink="false">http://spiderhunter.com/blogs/2006-05-18/117/</guid>
		<description><![CDATA[I&#8217;m a big fan of Jenstar from webmaster world as well as her JenSense web site. For the most part I&#8217;m interested in the information about AdSense and other contextual advertising mediums, occasionally she surprises me with information that is usually a bit more technical than I would expect. (Not that Jen isn&#8217;t technical, but [...]]]></description>
			<content:encoded><![CDATA[<p>I&#8217;m a big fan of Jenstar from <a href="http://webmasterworld.com" onclick="javascript:urchinTracker ('/outbound/article/webmasterworld.com');">webmaster world</a> as well as her <a href="http://jensense.com/" onclick="javascript:urchinTracker ('/outbound/article/jensense.com');">JenSense</a> web site. For the most part I&#8217;m interested in the information about <a href="https://www.google.com/adsense/" onclick="javascript:urchinTracker ('/outbound/article/www.google.com');">AdSense</a> and other contextual advertising mediums, occasionally she surprises me with information that is usually a bit more technical than I would expect. (Not that Jen isn&#8217;t technical, but her site is usually about advertising)</p>
<p>About a month ago she made a post in which she shows that her log entries verified that the Google AdSense media partner GoogleBot was adding information into the Google search engine. While Google maintains that they have different search engines spiders for different purposes, it certainly does not surprise me that the information is being used across their different services.</p>
<p>In fact it would be the best use of their resources to have a single search engine spider that collected all information about a website and then the each of their different services used that one huge database differently. I do understand Google not wanting people think that having an AdSense account would get them into the search engine results there are faster in any way. It just seems to me to be a huge waste of resources.</p>
<p><a href="http://www.jensense.com/archives/2006/04/adsense_mediapa.html" onclick="javascript:urchinTracker ('/outbound/article/www.jensense.com');">http://www.jensense.com/archives/2006/04/adsense_mediapa.html</a></p>
]]></content:encoded>
			<wfw:commentRss>http://spiderhunter.com/spider/118/different-googlebots-doing-the-same-work/feed/</wfw:commentRss>
		</item>
		<item>
		<title>Google Site Maps META Tags</title>
		<link>http://spiderhunter.com/seo/117/google-site-maps-meta-tags/</link>
		<comments>http://spiderhunter.com/seo/117/google-site-maps-meta-tags/#comments</comments>
		<pubDate>Thu, 18 May 2006 05:50:00 +0000</pubDate>
		<dc:creator>simpleenigma</dc:creator>
		
		<category><![CDATA[SEO]]></category>

		<guid isPermaLink="false">http://spiderhunter.com/blogs/2006-05-17/116/</guid>
		<description><![CDATA[Google announced recently on one of their blogs that they are now using META tags for site map validation. I personally think this is a fantastic idea and wish they would have done this from the beginning.
The Google site map program seems to be obsessed with proving that you own the page that you&#8217;re tracking. Which all [...]]]></description>
			<content:encoded><![CDATA[<p>Google announced recently on one of their blogs that they are now using META tags for site map validation. I personally think this is a fantastic idea and wish they would have done this from the beginning.</p>
<p>The Google site map program seems to be obsessed with proving that you own the page that you&#8217;re tracking. Which all in all is not a bad thing, but initially they do not have enough ways to prove you&#8217;re the owner of the site. While they have been adding more ways to validate your site in the past few months the initial way of validation was a file that could be put in the root of your home page.</p>
<p>In my case the site map service would not validate the file because my 404 error messages were not always returning 404 error codes. This was done on purpose and was something that I was unwilling to change.</p>
<p>Interestingly enough I had several of my sites that validated just fine. All the sites were on the same server and all the sites are configured in the exact same way, but for some reason about 25% validated fine while the rest were denied for not returning 404 error codes.</p>
<p>I have not tried the new META tag for site map validation yet as I am currently redesigning the core of many of my sites, but during this redesign I will be implementing the new META tag validation and I look forward to many other new features from site maps as well.</p>
<p><a href="http://sitemaps.blogspot.com/2006/05/more-about-meta-tag-verification.html" onclick="javascript:urchinTracker ('/outbound/article/sitemaps.blogspot.com');">http://sitemaps.blogspot.com/2006/05/more-about-meta-tag-verification.html</a></p>
]]></content:encoded>
			<wfw:commentRss>http://spiderhunter.com/seo/117/google-site-maps-meta-tags/feed/</wfw:commentRss>
		</item>
		<item>
		<title>Javamail IMAP CFC</title>
		<link>http://spiderhunter.com/imap/116/javamail-imap-cfc/</link>
		<comments>http://spiderhunter.com/imap/116/javamail-imap-cfc/#comments</comments>
		<pubDate>Mon, 20 Mar 2006 19:35:00 +0000</pubDate>
		<dc:creator>simpleenigma</dc:creator>
		
		<category><![CDATA[IMAP]]></category>

		<guid isPermaLink="false">http://spiderhunter.com/blogs/2006-03-20/115/</guid>
		<description><![CDATA[I&#8217;ve had a few requests to have access to my JavaMail IMAP CFC so I&#8217;m releasing it for anyone who wants to play with it.
This is free open source code. No need for any licensing at all. There is no warranty and I will not be providing any technical support.
I highly recommend that you use the [...]]]></description>
			<content:encoded><![CDATA[<p>I&#8217;ve had a few requests to have access to my JavaMail IMAP CFC so I&#8217;m releasing it for anyone who wants to play with it.</p>
<p>This is free open source code. No need for any licensing at all. There is no warranty and I will not be providing any technical support.</p>
<p>I highly recommend that you use the code as a basis for your own implementation and not use this code directly. Especially in a production environment.</p>
<p>I know of some issues that i has regarding how it does log-ins and I had always intended to somehow cache the log-in so that it would be able to do multiple transactions at once. If you find the code helpful, please put up some comments about it and if you feel like re-releasing the code I&#8217;d love to hear about it.</p>
<p><a href="http://www.spiderhunter.com/files/imap.zip" >Download JavaMail IMAP CFC</a></p>
]]></content:encoded>
			<wfw:commentRss>http://spiderhunter.com/imap/116/javamail-imap-cfc/feed/</wfw:commentRss>
		</item>
		<item>
		<title>Cleaning up comments</title>
		<link>http://spiderhunter.com/spiderhunter/115/cleaning-up-comments/</link>
		<comments>http://spiderhunter.com/spiderhunter/115/cleaning-up-comments/#comments</comments>
		<pubDate>Sat, 18 Mar 2006 17:38:00 +0000</pubDate>
		<dc:creator>simpleenigma</dc:creator>
		
		<category><![CDATA[SpiderHunter]]></category>

		<guid isPermaLink="false">http://spiderhunter.com/blogs/2006-03-18/114/</guid>
		<description><![CDATA[I just went through and cleaned up the comments across this site. Seems some people felt it necessary to spam the blog comments, if I notice this anymore I&#8217;ll have to rework the comment code to limit or require registration, which I do not what to do&#8230;
And for the record, the comment sections have code [...]]]></description>
			<content:encoded><![CDATA[<p>I just went through and cleaned up the comments across this site. Seems some people felt it necessary to spam the blog comments, if I notice this anymore I&#8217;ll have to rework the comment code to limit or require registration, which I do not what to do&#8230;</p>
<p>And for the record, the comment sections have code that tells spider to not follow any of the links and the Google AdSense code to not use the comments as part of the page they use to target the ads. So from a spammers perspective, putting up your link on this site will not help you in any way &#8230;</p>
]]></content:encoded>
			<wfw:commentRss>http://spiderhunter.com/spiderhunter/115/cleaning-up-comments/feed/</wfw:commentRss>
		</item>
	</channel>
</rss>
