Werner deBruin

Attack of the Scraper Bots

by Werner deBruin

2007/04/13

There’s no sleep for the righteous. Or so I found out recently when we noticed a drop in the rankings of one of my clients. I started off with the usual checks.

  1. When last was the site indexed?
  2. Any new error messages on Google Webmaster Tools?
  3. Checked for any broken links?
  4. Back link audit?
  5. Any new Google Algorithm updates?

Ok, so with 5 check marks what could have caused the downwards curve on the ranking rollercoaster? At first I thought that it was merely a weekly fluctuation, but then I notice every strategist’s nightmare - un-indexed pages. (Noooooooo!!!)

Through further investigations, I uncovered a nightmare for any site owner – we’ve become victims of Scraper Bots.

Scraper Bots

Software that scrapes the internet to find competing sites, then ripping the source code with the intent of duplicating the entire site on a different domain.

The question is, how would this benefit anybody?

Imagine adding fresh new content to your site, only to have it spidered 5 days later. In the meanwhile, without the knowledge of the site owner, the duplicate site gets indexed first. The search engines would know it’s a duplicate, but who was the original author? This might not sound too bad now – you’ve always done everything within the rules, and have a outstanding reputation with the search engines.

Imagine having 5,10 or 50 of these sites out there - raising search engine suspicions slightly, but maybe just enough to cause some doubt.

Having identified the problemSWIDSER (Sabotaging a website with the intent to damage search engine rankings), how do we get rid of this?

Firstly we need to notify the search engines of these sites. This is easily done via a simple and easy reporting tool which all the major SE’s offer. Although it takes time to investigate…

So now we need to stop this from happening. First you would need to find the IP address of the bot. This will need some thorough investigating through your web stats. All web stats packages should be able to supply you with the IP address of each bot that visits your website. Then I suggest one or more of the following:

  1. Ye Old Trusty Firewall
    Setting up a firewall does require some professional help, but would be your first defense against blocking out these type of threads. Speak to your server technicians for assistance.
  2. Server side
    If you are running a Apache server then you would be able to add a simple code into you access files:
  3. <IfModule mod_access.c>
            Order deny,allow

            Deny from IP1, IP2, IP3
    </IfModule>

    * IP representing the different IP addresses you want to block
  4. On - page
    If you are making use of PHP or ASP, you would be able to add a simple code to check where the IP is coming from, and if necessary, you could force a white screen or even redirect the bot to a different website.

** Unfortunately I could not add any examples. I would rather not supply these sites with another link.

Make a comment

To prevent GottaQuirk from becoming spam central, we block the use of certain words like porn, sex etc. We apologise for any inconvenience, but can't spend our lives deleting messages left by spammy friends.

Captcha