5 Ways to Fix Duplicate Content Issues in SEO
When pieces of information on a website are very similar or identical to other information found on other websites, duplicate content occurs. However, there are also near-duplicate content situations where there are very minor differences on two separate sets of information. But the real question is what is duplicate content in an SEO context?
The truth is that there is no duplicate content penalty. It’s a filter. Google explicitly wants unique content for their users in order to increase the quality of their index when it comes to information. The filter removes content that is not valid in terms of uniqueness, and displays the most valuable pages based on a number of different factors. So, don’t think of it as a penalty. That’s just plain wrong.
There are myths when it comes to understanding what is duplicate content in SEO, and being aware of these different myths can help you deal with duplicate content more effectively.
First, let’s talk about Google’s guidelines on this matter, as well as other situations that create duplicate content not necessarily covered by the guidelines. In addition, I will provide some common fixes, along with considerations that will help you prevent creating duplicate content in the future.
Common Myths About Duplicate Content
One of the most common myths is that when duplicate content exists, it will automatically trigger a penalty. This is incorrect. As mentioned above, it is a filter. When duplicate content is used, there is no reason one should expect a better ranking, because no effort has been made to make the content valuable and unique. In addition, the credibility of the website will suffer.
Finding out what duplicate content is in SEO, and that it doesn’t trigger a penalty automatically, is good information to have. You must still understand and know how to avoid creating duplicate content as part of your overall SEO implementation process, even unintentionally.
One of the best ways to avoid creating duplicate content is by making sure that the content you use is unique, by writing your own content and not stealing (plagiarizing) anyone else’s. This is by far the easiest way to come up with fresh and original content, while at the same time marketing a specific product or company.
Another way to avoid having duplicate content is by making the most out of your URLs – by making sure only one version of your URL loads, and that you aren’t using five different variations of the page URL when creating internal links.
Don’t forget that there is a difference between http://, https://, http://www., and https://www. In fact, all four are considered different URLs by Google, and if you’re linking using all of these instances, you’re creating duplicate content problems all on your own. A URL consolidation effort to make sure that all of your URLs, including canonicals, reflect the primary URL protocols and structures being used on your site is needed to ensure a proper functioning implementation free of defects, including potential for errors.
Be careful with staging sites. Make sure they are noindexed, and make sure that you don’t launch a site with an indexable staging site on the domain. Mismanaged indexable staging sites are another common problem I see in website audits all the time. This is because most developers lack managerial oversight on these subtle but very important settings.
What Really Creates Duplicate Content?
Duplicate content can be created in three ways: having substantial blocks of physical text content on your site that are appreciably similar and exact (such as in the case of a location-based website that only changes relevant pieces of text rather than writing unique content for each page), duplicate URLs creating duplicate content, and not managing your internal linking properly, which means taking into account all of these and only linking to proper URLs throughout your site. And, outright plagiarizing another website’s content (which is a huge no-no).
Google’s webmaster guidelines warns of creating thin, low-quality content pages.
While minor issues of this are unlikely to cause issues, if you do it over and over again, and it turns into hundreds of pages at scale on your site, this can present major issues. This can also lead to keyword cannibalization, in which multiple pages are competing for the same terms, lowering your opportunity to seriously compete in the SERPs.
In addition to duplicate content, unsubstantial thin content that provides little, if any value is discouraged. Google’s webmaster guidelines states the following about discouraged types of content:
- “Automatically generated content
- Thin affiliate pages
- Content from other sources. For example: scraped content or low-quality guest blog posts
- Doorway pages”
Also, their guidelines call out the following when it comes to low-quality content that doesn’t meet their specifications of quality:
- “Make pages primarily for users, not for search engines.
- Don’t deceive your users.
- Avoid tricks intended to improve search engine rankings. A good rule of thumb is whether you’d feel comfortable explaining what you’ve done to a website that competes with you, or to a Google employee. Another useful test is to ask, “Does this help my users? Would I do this if search engines didn’t exist?”
- Think about what makes your website unique, valuable, or engaging. Make your website stand out from others in your field.”
There you have it.
In other words, don’t take shortcuts. These shortcuts can eventually harm you in the grand scheme of things. Focus on long-term sustainable techniques that are not black hat in nature, and you will do well.
What Should I Do if I Have Duplicate Content?
The other aspect of understanding the context of what duplicate content really is in an SEO context is to know what to do when duplicate content already exists. Of course, prevention is always the best option but realistically speaking, it is not always feasible to achieve prevention. It’s also very important to know how to handle duplicate content when they are unconsciously created on your pages.
One way to be able to fix duplicate content on websites is to use the rel=”canonical” tag in the head section of the page. This tag will tell a search engine that the page is a copy of a particular page; more importantly, the tag will also point to the original version of the page. It is also worth noting that only one canonical annotation on every page should exist.
The next situation where duplicate content commonly occurs is with international SEO, and incorrect hreflang implementations. Using the hreflang rules properly can help you avoid duplicate content from multilingual structures. This tag will allow search engines to identify content in multiple language as two different content pieces, separating them out between languages. One common mistake when doing this is that incorrect country codes or language might be used.
When consciously creating a page with duplicate content, the best approach for this is to identify this early on so that it doesn’t affect the rankings of the original page. If you have a plan at the outset, you can set canonicals to prefer a specific version of content over all others. This will tell Google that you only want that page to be considered as the preferred page. If you already have practices in place to create unique content every time, then self-referencing canonicals are the way to go. It is, however, problematic to use this same best practice in every situation, because other situations may not lead to an effective result. Using your own judgment to do modern SEO analysis to determine if this is the right step for you is an important step that should not be overlooked.
You may also use noindex and nofollow to further engineer content to not be indexed. Doing this will tell the search engines to choose other pages that are indexed over these pages, leaving these pages to be user-facing, but not considered in your overall site content. But beware – if you’re using these tags improperly, you can run into serious situations that could have otherwise been prevented. Egregiously using noindex and nofollow can lead to the assumption that you are engaging in PageRank sculpting, an age-old SEO technique where SEOs would attempt to “sculpt” PageRank to artificially inflate rankings through internal and external links. This technique, thankfully, has gone by the wayside in recent years.
More Advanced Methods of Checking for Duplicate Content
Crawlers scan pages and identify the different areas that contain duplicate content. There are several advanced methods available to the SEO practitioner who wants to check for duplicate content on their pages.
For example, you can make use of the hash tag tracking in Screaming Frog on the URL of a page that has duplicate copy. If you have duplicate hash tags in the Screaming Frog results, then you most likely have duplicate content.
Another tool that can be used is Moz’s crawler which is expensive, but offers a trial period for 30 days. In addition to this, another tool called Siteliner can be used and it gives a thorough analysis about the pages that have duplicates. It is able to identify how identical two pages are and points to areas on the page that are similar.
Duplicate Content is Not Always a Huge Problem
The bottom line is that duplicate content is not something that should be treated as a huge problem. It has been proven time and time again that as long as people have a good general understanding of what duplicate content is and how it works, it can be handled pretty well. It is also worth emphasizing that duplicate content is not really something that can be avoided when dealing with SEO, so the best way to handle it is to just rely on different tools that have been created. Handling duplicate content doesn’t require a certain amount of expertise and there are various guides online that can help beginners.
All in all, duplicate content on web pages only signify room for growth because they push website owners to always strive to improve their pages. The existence of duplicate content in the context of SEO is yet another problem that can be solved. Just make sure you identify the duplicate content and correct it before it causes issues later.