Sphere Blog Search, crawling 9 pages in 16 seconds
Friday, March 23rd, 2007I love web crawlers. They index pages and bring readers from search-engines. But some web crawlers are just annoying. Like those gathernig e-mail addresses for spammers. And Sphere Scout, which has a very odd hit-grab-and-run behavior.
Sphere Scout visited my blog, fetched robots.txt to check for permission to crawl and then and grabbed 9 pages in 16 seconds - and that was it.
64.40.115.32 - - [23/Mar/2007:02:42:05 -0400] “GET /robots.txt HTTP/1.0″ 200 24 “-” “Sphere Scout&v4.0 (beta) - scout at sphere dot com”
64.40.115.32 - - [23/Mar/2007:02:42:06 -0400] “GET / HTTP/1.0″ 200 34940 “-” “Sphere Scout&v4.0 (beta) - scout at sphere dot com”
64.40.115.32 - - [23/Mar/2007:02:42:09 -0400] “GET /2007/01/12/are-you-sure-your-backup-routines-are-sufficient/ HTTP/1.0″ 200 15576 “-” “Sphe”
64.40.115.32 - - [23/Mar/2007:02:42:12 -0400] “GET /2007/02/21/creative-seo-whos-there-google-heres-a-page-just-for-you/ HTTP/1.0″ 200 15177 “”
64.40.115.32 - - [23/Mar/2007:02:42:14 -0400] “GET /2007/03/12/youd-be-shocked-and-amazed-if-you-knew-what-theyre-searching-for/ HTTP/1.0″ 200″
64.40.115.32 - - [23/Mar/2007:02:42:17 -0400] “GET /2007/02/09/yet-another-creative-google-clone-spammed HTTP/1.0″ 200 17297 “-” “Sphere Scout”
64.40.115.32 - - [23/Mar/2007:02:42:19 -0400] “GET /2007/03/12/youd-be-shocked-and-amazed-if-you-knew-what-theyre-searching-for HTTP/1.0″ 200 “
64.40.115.32 - - [23/Mar/2007:02:42:21 -0400] “GET /2007/03/22/vigilant-a-pretty-cool-word HTTP/1.0″ 200 12409 “-” “Sphere Scout&v4.0 (beta) -”
64.40.115.32 - - [23/Mar/2007:02:42:23 -0400] “GET /2006/10/11/the-enormous-power-of-plain-text-e-mail-security/ HTTP/1.0″ 200 14138 “-” “Sphe”
64.40.115.32 - - [23/Mar/2007:02:42:25 -0400] “GET /2007/03/22/vigilant-a-pretty-cool-word/ HTTP/1.0″ 200 12409 “-” “Sphere Scout&v4.0 (beta) “
There is nothing wrong with crawling the web. Every search-engine has to. I started using Google as my #1 search engine many years ago, and I still do for two reasons:
- I always find exactly what I’m looking for (this may have something do to with me knowing how to use it’s more advanced functions)
- It’s fast. Result 1-10 of 3830000 in 0.05 seconds? It’s hard to make a static web page load that fast.
But some of their actions the latest years are at best very questionable, so it makes me happy to see that other search-engines are at least trying to give them competition. Like the blog-search-engine Sphere. But hammering a page every 2 seconds?

If every new & supposedly “next big thing” search-engine did that then it’d kill the web and that would be the end of it. That’s probably an overstatement, but still: Most web crawlers don’t rush. They download a page, wait a while, and then download another page. They usually take their time. This prevents a single bot, or a handfull of bots who happen to hit the same site, from putting noticable load on a webserver. But those running “Spere Scout” don’t get that, they want all content and they want it now.
What’s Sphere, anyway?
It’s a blog-search-engine. A pretty bad one at that.
Speed? Sphere is so slow it’s redicilous. It really is very hard to make a search-engine come close to Google’s speed, but Sphere is just… way too slow.
Results? I tried a search for “911 inside job” and it only managed to find 43 links. Technorati, another way too slow blog-only searchengine, has page by page by page with results for the term “911 inside job“. It doesn’t say how many, you have to click next and it requres referrer when using &start=200 etc, but from I bothered to check (without changing start=200 using a fake referrer field, which I briefly considered) it’s got thousands of results of that term. Google, as always, p0wnes them both with it’s incredible “about 40,850 for 911 inside job. (0.76 seconds)“.
They’ve also got a whole lot of “Tools” such as browser extentions and widgets who they encurage bloggers to install on their sites. I read their “sphere it, tools and tips” page and after carefull consideration for about 0.9 seconds found that their most advanced browser extention is a searchplugin which does the job Google’s related: queries do, and their widgets - who show “post-related search results” looked like a more annoying version of Google Adsense - only without payment.
I found that the “Social bookmarks” widget I use - which I plan on rewriting, btw - has Sphere in it (it has like 60 sites you can choose between) - so I’m going to check if having the button has any effect on their crawling behaviour the next few weeks. Will it visit more frequently, perhaps? If it does then I may actually remove the button and warn other people about having it since a page pr. 2 seconds is just totally unacceptiable crawler behaviour.
In bullet summary:
- Sphere really should consider increasing their bots crawl-delay from 2 seconds, and
- Their blogsearchengine is redicilous, it’s slow, it finds nothing and it wouldn’t even pass as decient back in 1998.
Sphere: Related Content



