The year is 2023, and people are still heavily divided into two camps when it comes to duplicate content; those who panic unnecessarily, and those who could really do with panicking a bit more. I’ve come across people who will rewrite an individual sentence if it’s too close to that on another page of their site, and I’ve come across others who will happily duplicate an entire website, no shits (or canonical tags) given (!).
If you’re even reading this blog in the first place, you probably fall into the former category.
There’s still a lot of confusion surrounding the topic, including how close is too close, how much impact it really has, and probably most commonly, over what normally causes duplicate content issues.
What is duplicate content?
Duplicate content is simply content that appears online in more than one place, and that could be between pages of your own website, or between your site and another web page.
When there is more than one version of a piece of content – either identical or very similar – Google sometimes struggles to understand which version to rank. In theory, if you have a piece of content that is yours, and you published first, and have a more authoritative site, Google should prioritise yours. However, Google is notoriously bad for getting it wrong.
How much of a big deal is it?
You’ll hear people talk of a duplicate content penalty, but in reality such a thing doesn’t exist. You won’t find any Manual Actions for your site in your Search Console account if a few of your service pages are a bit too close for comfort.
It’s much more about missed opportunities than it is being punished. If your pages are competing against each other unnecessarily, then you could be missing out on traffic.
What counts as duplicate content?
Google is clever enough to understand that things like footer content are typically duplicated across a site, and doing so won’t result in any duplicate content issues.
Similarly, strap lines, and even key paragraphs that may feature across several pages – especially key service pages for example – won’t do you any harm either. If you do have a reasonable amount of content that falls under the above across some important pages that you’d like to rank, and it makes up quite a high proportion of the entire content of the pages, rather than cut it down, think about how you could add new, original and useful copy to that page to bring the percentage of duplicate content down.
Typically though, this isn’t the cause behind most duplicate content issues.
What are the most common causes of duplicate content?
URL variations
Whilst people tend to worry about the physical wording of their pages, duplicate content issues are more likely to arise from URL variations.
If you’ve got separate versions of your site with the ‘www’ and without, or if you’ve got http:// and https:// pages then technically you’ll have the same content on two separate places, as they’ll have unique URLs. When this is the case, you’ll obviously have vast amounts of duplicate content.
This is relatively straightforward to fix, by using canonical tags to specify which of the duplicates is the correct version and using 301 redirects to the correct URL.
Content that’s too similar on your own site
As we’ve already touched upon, this is most likely to be as a result of chunks of content replicated across pages, where there’s a low volume of content overall. There’s no urgent issue here but you can reduce the proportion of duplicate content by adding new original content.
Duplicate content from an external source
Occasionally you might find your own content appearing on another site without your awareness or approval. Unfortunately not everyone on the internet is an upstanding citizen. Who knew.
Of course this is incredibly frustrating, but in a lot of cases, simply contacting the site owners and requesting that the content is removed is often enough for them to take it down (and actually worked for me on behalf of one of my clients last week). I would recommend taking a screenshot first though in case you need to take it further, and it might be worth bookmarking the site to keep an eye on in future.
If they don’t respond or refuse to remove the content, you can file a DMCA notice with Google.
Checking for duplicate content
Whilst I personally feel that as an industry we can tend to overreact when it comes to duplicate content, it is wise to add it to your checklist, more so that you’re not holding back your organic search performance, and so that you’re aware sooner rather than later if anyone is plagiarising your work.
Tools like Copyscape can help you prevent duplicate content issues, but to be honest, regular SEO crawls will pick up most issues, e.g. duplicate content from URL variations.