So what are canonical URLs? Due to a number of factors, it’s possible to display the exact same page on some sitse with a bunch of different URLs – this situation causes its own host of problems that search engines have to deal with. They attempt to solve it by figuring out what the canonical, or master URL for a page is. This way they can filter out all the other pages that have exactly (or nearly) the same content, and provide better results for their users. There is an excellent write up by Google web guy Matt Cutts on his web site
All of these URLs have the same content, but would be considered unique pages to a search engine without extra work:
So why do I care?
Right about now you might be thinking, ‘well this is a problem for the search engine, not me’. Well not really – their problems are often the webmaster’s problem when they affect the traffic the search engine drives to your site.
Problem 1: Lost link data
Say half the pages on your site point to your home page with http://domain.com/ and the other half point to http://domain.com/index.html. Since Google and others will filter out one version based on their algorithm, the text of the links pointing to the filtered page won’t help it rank. What’s worse, is say 90% of your pages do it one way and 10% the other, but the search engine makes the less-linked page the canonical one. The same goes for external sites that link to you- if you don’t things the links will go to filtered pages. You’ve just lost the power of all those links.
Problem 2: The wrong URL gets canonized
This isn’t always a big deal, but often sites rewrite their URLs into more friendly ones, but if the ugly version is already in the index, you will usually not get it replaced with the new one – even if you change all your internal site links.
What you can do
So it’s a good idea to make it clear to search engines which page is the canoncial URL – how to do it?
step 1: 301 Redirects
A 301 redirect is a header sent from a web server to tell users and search engines that a page has a new location, and the change is permanent. In apache, it’s simple to effect this. Simply create a file in the root directory of your site called .htaccess. Add a line like:
Redirect 301 /foo http://foobar.com/foo
This would redirect users who entered the site at /foo to the new url http://foobar.com/foo. I would put a few lines to handle potential ‘site root’ canonical problems like the one listed above – they are by far the most common and problematic. For example:
Redirect 301 /index.html http://foobar.com/
Redirect 301 /index.htm http://foobar.com/
Redirect 301 http://foobar.com http://foobar.com/
I recommend using the trailing slash version (http://domain.com/) as the canonical home page, since that is typically what others will link to you with, and it allows you to change server technologies without a redirect. You may need to do this for subdirectories as well or other pages, but it varies by the site.
step 2: Standardize internal links
This is really the most important thing you can do. Every link on your site should use the exact same URL for every unique page – no exceptions. Many database driven sites have problems with this, since they often allow URLs to be formatted in different ways to see the same page. Often on sites with URLs like http://bthobbies.com/product_info.php/cPath/5_225/products_id/34790 can also be written like
http://bthobbies.com/product_info.php?cPath=5_225&products_id=34790. Pick a formatting rule and stick with it.
If you do have to change any links, make sure to keep track of all the URLs that are now discounted. Then add to your .htaccess file statements to redirect from the old version to the right one. There is another method if you need to redirect a lot of pages that share syntax rules, and it’s a feature called mod_rewrite. You can find tutorials and an entire site dedicated to it at doriat.com.
Teaching search engines
By redirecting multiple URLs to one master URL, you save search engines the trouble of trying to figure out which one to make canonical. A link that goes to the old version will count for the new page if 301 redirected, so you don’t lose the power of any of your links.
More from Fourth Wave
Latest posts by David Norris (see all)
- Make Your NetSuite Site Builder Site Secure – HTTPS Throughout - May 28, 2017
- An Introduction to Automating XML Sitemaps for NetSuite Companies - November 13, 2016
- An Introduction to NetSuite’s Reference Checkout & My Account Bundles - April 18, 2016
- Are your e-mail templates and scripts ready for the 2016.1 NetSuite upgrade? - February 23, 2016
- NetSuite Site Builder Categories – Bugs and Problems I Learned the Hard Way - November 20, 2015