≡ Menu

Canonical URLs and Why You Should Care About Them

So what are canonical URLs? Due to a number of factors, it’s possible to display the exact same page on some sitse with a bunch of different URLs – this situation causes its own host of problems that search engines have to deal with. They attempt to solve it by figuring out what the canonical, or master URL for a page is. This way they can filter out all the other pages that have exactly (or nearly) the same content, and provide better results for their users. There is an excellent write up by Google web guy Matt Cutts on his web site
An example:
All of these URLs have the same content, but would be considered unique pages to a search engine without extra work:

  • http://domain.com/
  • http://www.domain.com/
  • http://www.domain.com/index.html
  • http://www.domain.com

So why do I care?

Right about now you might be thinking, ‘well this is a problem for the search engine, not me’. Well not really – their problems are often the webmaster’s problem when they affect the traffic the search engine drives to your site.
Problem 1: Lost link data
Say half the pages on your site point to your home page with http://domain.com/ and the other half point to http://domain.com/index.html. Since Google and others will filter out one version based on their algorithm, the text of the links pointing to the filtered page won’t help it rank. What’s worse, is say 90% of your pages do it one way and 10% the other, but the search engine makes the less-linked page the canonical one. The same goes for external sites that link to you- if you don’t things the links will go to filtered pages. You’ve just lost the power of all those links.
Problem 2: The wrong URL gets canonized
This isn’t always a big deal, but often sites rewrite their URLs into more friendly ones, but if the ugly version is already in the index, you will usually not get it replaced with the new one – even if you change all your internal site links.

What you can do

So it’s a good idea to make it clear to search engines which page is the canoncial URL – how to do it?
step 1: 301 Redirects
A 301 redirect is a header sent from a web server to tell users and search engines that a page has a new location, and the change is permanent. In apache, it’s simple to effect this. Simply create a file in the root directory of your site called .htaccess. Add a line like:
Redirect 301 /foo http://foobar.com/foo
This would redirect users who entered the site at /foo to the new url http://foobar.com/foo. I would put a few lines to handle potential ‘site root’ canonical problems like the one listed above – they are by far the most common and problematic. For example:

Redirect 301 /index.html http://foobar.com/
Redirect 301 /index.htm http://foobar.com/
Redirect 301 http://foobar.com http://foobar.com/

I recommend using the trailing slash version (http://domain.com/) as the canonical home page, since that is typically what others will link to you with, and it allows you to change server technologies without a redirect. You may need to do this for subdirectories as well or other pages, but it varies by the site.
step 2: Standardize internal links
This is really the most important thing you can do. Every link on your site should use the exact same URL for every unique page – no exceptions. Many database driven sites have problems with this, since they often allow URLs to be formatted in different ways to see the same page. Often on sites with URLs like http://bthobbies.com/product_info.php/cPath/5_225/products_id/34790 can also be written like
http://bthobbies.com/product_info.php?cPath=5_225&products_id=34790. Pick a formatting rule and stick with it.
If you do have to change any links, make sure to keep track of all the URLs that are now discounted. Then add to your .htaccess file statements to redirect from the old version to the right one. There is another method if you need to redirect a lot of pages that share syntax rules, and it’s a feature called mod_rewrite. You can find tutorials and an entire site dedicated to it at doriat.com.

Teaching search engines

By redirecting multiple URLs to one master URL, you save search engines the trouble of trying to figure out which one to make canonical. A link that goes to the old version will count for the new page if 301 redirected, so you don’t lose the power of any of your links.

The following two tabs change content below.
I have been working with computers and web sites for 20+ years, and have enjoyed mastering many areas of technology. I have been building websites for about 15 years, and working with NetSuite for more than 10. I have worked with dozens of small and medium-sized companies in that time, helping them to understand and leverage the latest tools to grow their business. My business is all about helping you to maximize your business, and I prefer to establish long-term relationships with clients who are dedicated to embracing smart ways to optimize and expand their business.

0 comments… add one

Leave a Comment