Google has been discussing the issue of duplicate content on the Webmaster Central Blog.
Duplicate content refers to blocks of content that is the same or very similar within a domain or across different domains. This can be done in an attempt to manipulate the search engine rankings or attract more traffic to a website, something Google seeks to deal with.
In other cases though, there are plenty of legitimate reasons for websites displaying duplicate content, such as versions of web pages that have been optimized for printing, or pages intended for mobiles.
Sven Naumann of the Google Search Quality team defines two different types of duplicate content issues:
Duplicate content within a domain:
This is often unintentional, and is when content from one page of a website appears on other pages within the same site. Naumann suggests that webmasters should block Google from indexing these areas of the site.
Duplicate content across domains:
This is where identical content from one site appears across multiple websites. This could be due to the syndication of articles, in which case Google recommends that the syndicated article links back to the original to help the search engine decide which is the original source.
In other cases, third parties may have ‘scraped’ content to put it on different sites, normally to profit from the traffic, such as spam blogs which scrape content in order to gain income through AdSense.
According to Naumann though, websites need not worry about this:
“When encountering such duplicate content on different sites, we look at various signals to determine which site is the original one, which usually works very well. This also means that you shouldn’t be very concerned about seeing negative effects on your site’s presence on Google if you notice someone scraping your content.”
As indicated in some of the comments on the blog, we are not so sure that we can call rest easy about site scrapers…