We’ve recently had clients asking about warnings they got from Google Search Console. We checked it out, and their sites were fine. And interestingly — the warnings came up because the site settings were working correctly to prevent a serious problem!
The problem: Duplicate Content.
What is duplicate content?
Google sends “bots” running all over the internet to find websites & pages to read. After scanning a page, it keeps it in an index to present the page to users when they use the search engine.
Duplicate content occurs when Google finds a page or some text that is identical or very similar to another page.
This can happen when the matching copies are on separate sites, or within the same site.
What is google trying to do?
When a user searches for some keywords, Google thinks about what they want. Then it tries to give them some links that will lead them to the best webpage.
Google needs to consider the content on every page that it knows and rank them for some keywords.
Thus, duplicate content
Google needs to remember and rank content on various pages. So if it finds identical text, it has to think, “which one gets the ranking power for keyword XYZ?” How does Google know which page/text is more valuable?
It can’t really figure this out — so generally it picks the oldest one it remembers.
How is it caused?
You might be thinking, “ok, no problem. I’m not copying text. My content is all original.”
And that’s a measure to take. Writing original content in your own words that is meant to be useful to your site visitors should mostly keep your site’s reputation in good shape.
But a lot of duplicate content is hiding in your site, mistakenly created by your settings!
URL settings
Search engines remember pages by URLs. Each URL is a unique page.
Let’s say you have a page called “our super services.”
If you’re not careful about your settings, this page could potentially be indexed up to 4 times!
Here’s how:
- http://mycompany.com/our-super-services/
- https://mycompany.com/our-super-services/
- http://www.mycompany.com/our-super-services/
- https://www.mycompany.com/our-super-services/
You only meant to make one page. But if each of these URLs gets indexed, they will each be seen as a separate page containing identical content with the others.
Blog archive pages
If you use a blog, you probably have archive pages that list your most recent posts for each category. WordPress includes this functionality out-of-the-box.
Often these pages will include a sample of the text from each post. This will create identical blocks of text on at least 2 pages of your site — resulting in duplicate content.
Conclusion
I hope this post gives you a good impression of what “duplicate content” means, how it works, and how it can sneak up on you. Most website owners have good intentions and try to follow the rules, but there is a danger that your site itself is causing this problem to occur!
Check out how duplicate content can hurt your site.
Or jump over to check out some steps to resolve duplicate content on your blog archive pages.