Attack of the Scraper Bots

by Werner deBruin

There’s no sleep for the righteous. Or so I found out recently when we noticed a drop in the rankings of one of my clients. I started off with the usual checks.

  1. When last was the site indexed?
  2. Any new error messages on Google Webmaster Tools?
  3. Checked for any broken links?
  4. Back link audit?
  5. Any new Google Algorithm updates?

Ok, so with 5 check marks what could have caused the downwards curve on the ranking rollercoaster? At first I thought that it was merely a weekly fluctuation, but then I notice every strategist’s nightmare - un-indexed pages. (Noooooooo!!!)

Through further investigations, I uncovered a nightmare for any site owner – we’ve become victims of Scraper Bots.

Scraper Bots

Software that scrapes the internet to find competing sites, then ripping the source code with the intent of duplicating the entire site on a different domain.

The question is, how would this benefit anybody?

Imagine adding fresh new content to your site, only to have it spidered 5 days later. In the meanwhile, without the knowledge of the site owner, the duplicate site gets indexed first. The search engines would know it’s a duplicate, but who was the original author? This might not sound too bad now – you’ve always done everything within the rules, and have a outstanding reputation with the search engines.

Imagine having 5,10 or 50 of these sites out there - raising search engine suspicions slightly, but maybe just enough to cause some doubt.

Having identified the problemSWIDSER (Sabotaging a website with the intent to damage search engine rankings), how do we get rid of this?

Firstly we need to notify the search engines of these sites. This is easily done via a simple and easy reporting tool which all the major SE’s offer. Although it takes time to investigate…

So now we need to stop this from happening. First you would need to find the IP address of the bot. This will need some thorough investigating through your web stats. All web stats packages should be able to supply you with the IP address of each bot that visits your website. Then I suggest one or more of the following:

  1. Ye Old Trusty Firewall
    Setting up a firewall does require some professional help, but would be your first defense against blocking out these type of threads. Speak to your server technicians for assistance.
  2. Server side
    If you are running a Apache server then you would be able to add a simple code into you access files:
  3. <IfModule mod_access.c>
            Order deny,allow

            Deny from IP1, IP2, IP3
    </IfModule>

    * IP representing the different IP addresses you want to block
  4. On - page
    If you are making use of PHP or ASP, you would be able to add a simple code to check where the IP is coming from, and if necessary, you could force a white screen or even redirect the bot to a different website.

** Unfortunately I could not add any examples. I would rather not supply these sites with another link.

2007/04/13 | permalink | comments (0) | trackbacks (0)
Bookmark with del.icio.us Digg It Submit to Reddit muti sphinn  
Visit Brandseye.com  Subscribe to RSS

Comments

post a comment

No comments available.

Name:
E-mail:
Url:
Comments:

Markup guide:

**
makes text bold
**

//
makes text italic
//

--
creates a link
--

(two dashes, no http://)
Remember personal info?
Notify me of follow-up comments?
SPAMCHECK:
Captcha: Captcha
 

Quirk eMarketing
Visit our Website

Subscribe

RSS feed Post feed
RSS feed Comment feed

RSS to Email

Get our latest blog posts delivered straight to your inbox.

 

eMarketing News

Subscribe to our fortnightly newsletter which is packed with interesting eMarketing news, views and other quirky titbits.

December

S M T W T F S
  1 2 3 4 5 6
7 8 9 10 11 12 13
14 15 16 17 18 19 20
21 22 23 24 25 26 27
28 29 30 31      
             

 

Archives

  

Categories

Recent Posts

Recent Comments

  • gavich on Designing for the iPhone
  • Janine on Designing for the iPhone
  • Kelly on With Nandos We Can COPE
  • Guy McLaren on With Nandos We Can COPE
  • Tony Roocroft on Conversion Optimisation Tools
www.brandseye.com

Wannwork@quirk


More photos of the QuirkStars At Play
Quirkstars

Name:
Friends of Quirk
Websites:
www.quirk.biz

Skribit: Social Suggestions

 
Afrigator