How Web Scraping Can Be Devastating for Your SEO Score

Short for “search engine optimization,” SEO is one of the most important concepts to master if you run a website that you want people to discover. With most people nowadays accessing websites through a search engine, rather than manually typing the URL, good SEO practices are essential for getting your site to show up in relevant searches.

In a world in which searchers rarely move beyond the first or second page of a search engine’s results, good SEO makes all the difference between a website with high levels of organic traffic and one that only a handful of people will read. If that website sells a product, a strong SEO game could be the determining factor in making a business succeed or fail.

For obvious reasons, everyone with a website is chasing good SEO. But there are also bad actors who may harm your SEO — either by wilfully targeting you as a competitor or simply as a lazy attempt to gain free content by copying content that you have posted, often at great expense to yourself.

Welcome to the world of web scraping.

Scraping the web

Web scraping bots are automated agents which “scrape” the internet, taking content and data from websites. They may function in several different ways, such as recognizing unique HTML site structures, extracting data from APIs, extracting and transforming content, or similar. However, in all cases they are accessing your site’s data for their own purposes.

Not all forms of web scraping are bad. For example, a price comparison site can only work effectively if it uses bots to extract pricing and product description information from various partner seller websites so that they can be ranked and compared against each other.

Companies engaged in market research might, meanwhile, use bots to pull user-generated content from social media sites to carry out things like sentiment analysis to gauge, for instance, enthusiasm about a new product or service.

Fighting back against the “appreciably similar”

But not all web scraping tools are quite so benign. Web scraping might be used by a competitor in order to undercut prices on the same products. This can be extremely damaging to businesses, resulting in search engines showing lower prices above your own and costing you customer visits and conversions.

If the targeted website in question provides content as its value proposition, a web-scraping bot could alternatively steal this content uncredited and duplicate it elsewhere. The problem with doing this is not simply the fact that it splits your audience by giving them somewhere else to visit to get the same content, depriving website owners of cashing in on their hard work. While this is not always the case, search engine rankings can punish duplicate pieces of content that appear in more than one location on the internet (meaning one URL). Google refers to these as “appreciably similar” pieces. The reason for this downgrade is because, not wanting to provide searchers with a page of results in which multiple results are the same, search engines do not know which version to rank in terms of query results.

In addition, search engines like Google use inbound links to determine the value of certain websites (this was the brilliant idea behind Larry Page and Sergey Brin’s PageRank algorithm which started Google). The idea is that, like checking out a movie because multiple people have independently recommended it to you, a website that receives links from multiple other websites is considered to be a valuable one. However, if that same information appears on multiple websites as a result of duplicated content the link equity will be split between several different sources.

Protect what you’ve built

For all kinds of reasons, protecting your site against malicious web scraping (meaning web scraping of your website that is done without your permission) is a smart idea. Web scraping can cost you money and reputation. While a content scraping attack isn’t necessarily as overtly destructive as a DDoS (distributed denial of service) cyber attack, the results can nonetheless be devastating.

Fortunately, the tools exist to help you counter this threat so as not to fall victim to a web scraping attack. Using a WAF (web application firewall), you can sort legitimate from illegitimate traffic as it visits your website. Processes like HTML fingerprinting and IP reputation can help distinguish bad actors from good. Similarly, progressive challenges like cookie support and JavaScript execution can help to weed out the bad bots when and where they appear.

SEO is like reputation online: it’s what keeps you in good standing. Losing that good standing, or having it split and compromised, can have a detrimental impact on your website — and, quite possibly, your livelihood. Sadly, bots make it easier for attackers to inflict this kind of damage in an automated manner. Solve this problem, and you’ll go a long way toward eliminating one of the biggest challenges you face on the road to success online.

Subscribe Today

GET EXCLUSIVE FULL ACCESS TO PREMIUM CONTENT

SUPPORT NONPROFIT JOURNALISM

EXPERT ANALYSIS OF AND EMERGING TRENDS IN CHILD WELFARE AND JUVENILE JUSTICE

TOPICAL VIDEO WEBINARS

Get unlimited access to our EXCLUSIVE Content and our archive of subscriber stories.

Exclusive content

Latest article

More article