≡ Menu

Canonical URLs and Duplicate Content

What Everyone With a Website Should Know

Two distinct but similar concepts that are important to understand, and easy to get wrong. SEO basically boils down to sending search engines the signals that they like, so every website owner should strive to ensure they are handling canonical and duplicate content issues properly.

I will detail some of the issues that can come up along with my recommendation for handling it.

Duplicate Content

when a search engine detects that two pages are similar enough in their content, it can trigger handling for duplicate content.Google has made it clear that there is no duplicate content penalty. Instead, think of it as a filter. Search engines don’t want to have more than one copy of a given page in its index, and risk having two results on a SERP go to the same content. Google will try to determine which page should take precedence, whether the pages that are similar are within the same site or between multiple sites. In general, it will go with the page with the most link authority.
Examples of duplicate content issues:

  1. If you run multiple websites and sell the same item on all of them, and the descriptive text and other details are identical, search engines will probably filter out one of the domains in a SERP, even if the code for the site template for navigation is different from site A to site B.
    Best Practice: Rewrite the descriptive text and other content for your product pages for each site so they appear unique.

  2. Winnowing or faceted navigation on product pages can create a lot of URLs where the content is substantially the same, or identical. For example, sorting first by size and then by color might generate a URL like:
    http://domain.com/products?size=m&color=black
    while sorting by color first and then by size would result in: http://domain.com/products?color=black&size=m
    but each displays identical content. In these cases, search engines will generally filter out one of those URLs using its best guess. If it chooses the URL with less link authority, it will hurt your rankings.
    Best Practice:Enforce a consistent order to the query string order for faceted navigation, so that size always comes before color, ensuringthe the same URL will be generated regardless of the order selected by the user. You can also include a no index meta-tag for some variations/search options, to avoid having Google try to index some of them. Another option is to use a canonical URL tag in certain situations, like when sorting by multiple different methods, to tell Google that the page is the same no matter how it is sorted.

  3. Site search results pages (SERPs) – Sometimes a search engine spider will index many search pages for a site, which can waste much of the bandwidth allotted for indexing the site overall. usually, site search results pages are less important than other pages on the site, which should get more attention.
    Best Practice: Add a no index, follow tag to the <head> of SERP pages like this:
    <meta name="robots" content="noindex,follow">
    This will help ensure that search engine spiders don’t waste time indexing your site search pages.

  4. International sites with the same content as the.com. Say you have a website on a .com TLD and another on a different top level, like .co.uk. The .com content is targeted to US users, and the other to a British audience. In this case, having duplicate content is not a big deal, since you only want to have one website or the other show up for searches, not both. Sometimes the.com website will appear over the country specific site in a SERP, simply because it has more link authority.
    Best Practice: Use Google webmaster tools to specify the target audience for each of your websites. If someone from a country that has a specific site lands on the.com, use a server-side script to redirect them to the appropriate TLD (e.g. to domain.co.uk).

Canonical URL Issues

The canonical URL issue involves telling search engines which version of a URL to index. Depending on query string parameters and other complications, one page might be associated with many URLs. Search engines will attempt to identify the canonical, or authoritative URL for each page. If they choose a format that is not the one that your internal link structure for your site uses, you will lose the inherited link authority that you would otherwise have. Unlike duplicate content, canonical URL issues happen solely within a site, not between multiple sites.
Examples of canonical URL issues:

  1. Many content management systems (CMS) use categories in generating search engine friendly URLs, like this:
    http://domain.com/shirts/hawaiian/bigblueflowers.html
    If there are multiple categories that items live in, which is common, one product detail page might end up with several URLs, e.g.
    http://domain.com/shirts/blue/bigblueflowers.html
    http://domain.com/clearance/mens/bigblueflowers.html
    Since the content on the page will be nearly identical, this becomes a canonical URL issue. Search engines will attempt to pick the URL that they think is the most authoritative, and if it picks the URL with less link authority, it will hurt your rankings.
    Best Practice: Search engines know that this is a common fact of life with database driven websites, so there is no penalty for duplicate content. To ensure the best URL is chosen, use the canonical URL tag in the <head> section. For example:
    <link rel='canonical' href='http://domain.com/shirts/blue/bigblueflowers.html'>

  2. Online advertising systems can put query string parameters at the end of URLs when sending traffic to them. Your rankings could be hurt if a search engine chooses the wrong format of the URL to index, especially if it doesn’t match the version that is used by your internal link structure.
    Best Practice: By always including the canonical tag on your static pages, it ensures that Google will index the proper format of the URL and ignore the tracking strings. You can also specify within both Google and Bing webmaster tools which query string parameters change content and which can be ignored.

How to Monitor Your Site

You can see which URLs Google has indexed for your site by typing this in a Google search box:
site: www.domain.com
If you notice any pages that are using the wrong URL format, add the canonical tag to that page with the correct URL. if you add keywords after the.com in that search, Google will show you which pages it thinks will rank for those terms. If page A and page B are considered duplicate content, one or the other should show up in the’s SERP, not both. lastly, you can see the cached version of a page by typing this:
cache:www.domain.com/page.html
At the same time, Google will auto correct the URL to the version it thinks is canonical.

The following two tabs change content below.
I have been working with computers and web sites for 20+ years, and have enjoyed mastering many areas of technology. I have been building websites for about 15 years, and working with NetSuite for more than 10. I have worked with dozens of small and medium-sized companies in that time, helping them to understand and leverage the latest tools to grow their business. My business is all about helping you to maximize your business, and I prefer to establish long-term relationships with clients who are dedicated to embracing smart ways to optimize and expand their business.

4 comments… add one

  • where can i buy garcinia cambogia August 12, 2013, 5:20 pm

    Hi there would you mind stating which blog platform you’re working with? I’m planning to start my own blog
    soon but I’m having a hard time selecting between BlogEngine/Wordpress/B2evolution and Drupal. The reason I ask is because your layout seems different then most blogs and I’m looking for something completely unique.
    P.S My apologies for getting off-topic but I had to ask!

    • David Norris August 14, 2013, 9:25 am

      Hello,

      I use Wordpress, and have found it to be a great platform.

      -David

  • Aditya Jois August 29, 2013, 7:47 am

    Hi, thanks for putting in such a nice way. I am running a sports related blog and I want to display some of the wonderful articles from Ezine and Technorathi(with the respective author name and a link back!)for my loyal readers.
    Does it effect on SEO even if I place a noIndex metatag for the duplicate content?

    • David Norris September 1, 2013, 9:33 am

      Hello,

      Thank you for the question. If you are simply republishing an article in its entirety, then you should be safe including a noindex meta tag. If you add some of your own commentary, then you are adding to the conversation, and you don’t need to worry about the noIndex tag. You are linking back to the original source, so search engines won’t be confused about which is the authoritative/original version of that article.

      -David

Leave a Comment