Friday 11 March 2011

Breaking down 'google' sitemaps

Just a quick follow up on my previous post on 'google' xml sitemaps :

Once you start databinding your sitemaps, especially if you have as much data as we do on presalesadvisor is to break the sitemap down, using a master index map. These look a lot like the normal sitemaps, but uses a sitemapindex top level element in place of the urlset, and sitemap in place of url. Each location (sitemapindex - sitemap - loc) should then point to individual sitemap files which google, etc will then drill down in to.

A good example of where this would be useful is on presalesadvisor, where there is a huge quantity of data being generated in multiple regional sets. When generated all in one big sitemap file the load time is huge, and risking timeout. Instead we set up our objectdatasources with a querystring parameter of the region code (e.g. en-GB, en-US) and then link to these subsets from the main sitemapindex. With this simple change load times and server load are greatly reduced (which is more important these days from the SEO angle with google taking load times in to account in their ranking algorithms), and we have an all round win.

The sitemaps themselves remain exactly the same as before, and the new sitemapindex looks like this:

<?xml version="1.0" encoding="UTF-8" ?>
<sitemapindex xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
 <sitemap>
   <loc>http://public.presalesadvisor.com/Sitemap/XmlSiteMap.aspx?regionID=en-US</loc>
  </sitemap>
 <sitemap>
   <loc>http://public.presalesadvisor.com/Sitemap/XmlSiteMap.aspx?regionID=en-GB</loc>
  </sitemap>
</sitemapindex>

In terms of function all you do differently when logged into your google site admin is add the new SiteMapIndex url as your sitemap rather than any of the actual individual sitemaps