Using multiple sitemaps to analyse indexation on large sites

by Patrick Altoft on / 30 responses

One of the easy wins in improving search traffic to a large site is to improve indexation. Indexation isn’t about the raw number of pages indexed, it’s about increasing the percentage of real, high value pages, that are indexed.

Forcing Google to index useless pages that won’t get any traffic isn’t going to help things.

Indexation is quite a straightforward issue, every site has an indexation cap based on a number of factors including:

  • PageRank
  • Trust
  • Site / server speed
  • Duplicate content

The last one is hard to explain but basically if Google sees loads of pages that are the same then it probably won’t bother to do as deep a crawl of the site as if it found a lot of high value unique pages.

Monitoring indexing using the site: command every month is good and looking at the number of pages that receive at least one visitor each month is better but both of these methods just look at the site as a whole. What we need is a method of breaking the numbers down so we can see which pages are not indexed and figure out how to improve things.

Multiple sitemaps

This is where using multiple sitemaps comes in – rather than just using one giant sitemap what we like to do is use a sitemap for each type of page on the site.

That way we can look at the number of pages indexed for each page type and immediately see that 76% of product pages are indexed but only 43% of the lower level paginated category pages are indexed for example.

Once you can diagnose exactly the type of pages that Google doesn’t want to index you can fix the issue by improving PageRank flow to those pages and adding more unique content.

Some ideas for the type of pages you might like to look at separately:

  • New products this month
  • Top selling products
  • Pages in French/English/German etc
  • Products that have not been selling
  • Blog posts from a particular month/year
  • Product pages
  • Category pages
  • Paginated category pages (page 2 of 10 etc)
  • Products in a certain category

Thanks to John from web development leeds for the screenshot.

Patrick Altoft is Director of Search at Branded3, a Leeds SEO & Digital Agency specialising in SEO, Web Design, Development & Social Media.

Get daily posts direct to your inbox

You can get our blog posts delivered for free by email every day - simply add your email address to the box above, or alternatively you can grab the RSS feed.

Comments

Read the 24 comments below, or add your own!

March 29, 2010 at 12:02am

Brilliant post, Patrick. Really useful info and it will be put to work immediately :)

Thanks a lot!

Reply

March 29, 2010 at 9:24am

This is an interesting way to increase SEO rankings, and although it appears to require more work and analysis than normal, the results seem to outweigh the costs.

Grouping pages and creating an individual sitemap for each group type is a good way to analyse how Google sees your site, and develop its page ranking further. This process would obviously be easier for smaller sites, but I wonder how it could be implemented on ecommerce sites with many products and categories.

I’ll be interested to see the comments regarding this, and how many try and implement it. Maybe some feedback on their results in the future would be a useful resource to see how successful this process is.

Reply

March 29, 2010 at 9:32am

Internesting angle – an excellent method of segmenting conversion rates on different parts of your website. I like to do something similar – hosting multiple blogs on the same domain, hence being able to submit each blog to blog directories and create a new sitemap for each blog.

Reply

March 29, 2010 at 11:04am

It’s quite a new perspective on the use of sitemaps. Will try and see what results I get.

Reply

Tim
March 29, 2010 at 4:13pm

Great post. Will be trying this out.

Reply

Ian
March 29, 2010 at 7:23pm

One addition: If you use an index sitemap, then Google Webmaster Tools does the work for you, showing you the sub-maps and the indexation of each.

Reply

March 29, 2010 at 7:54pm

Ian, can you expound on what you mean by “index sitemap?” That sounds interesting.

Reply

ian
March 29, 2010 at 8:51pm

Google lets you create a single, central sitemap that points to other ones. You can search for ‘index sitemap’ and I think you’ll find it.

Reply

March 30, 2010 at 4:04am

Here’s my experience with a very large site – 74 million pages. Google decided that 72 million of them were duplicate content owing to substantial content being the same, although technically unique. There was one index file and several thousand individual sitemaps. The low-quality pages were in distinct sitemaps, but this did not get the remaining sitemaps crawled substantially. Google did honour the priority hint to some degree (the low-quality pages were 0.2, while the better pages were given a higher priority hint).

Then we removed the low-quality sitemaps. No noticeable improvement to the indexing of the ‘good’ pages. Then we removed the low-quality pages even though there was no sitemap for them. Things are slowly picking up now re indexing.

Reply

Jordan
March 30, 2010 at 11:34pm

Can anyone else comment on what Ash stated. I would love to hear if the existence of thing pages even if they are not linked to can harm a site.

Reply

April 1, 2010 at 5:57pm

I note your comment that “if Google sees loads of pages that are the same then it probably won’t bother to do as deep a crawl of the site” and wonder if you have any evidence of this.

I was under the impression that crawl depth was correlated with page rank so if (for example) you have a PR4 home page leading to a PR2 catalogue page that, in turn, has a thousand PR0 product pages coming from it, why would the presence of those product pages make any difference to crawl depth?

Reply

April 2, 2010 at 12:54am

Jordan, A troubling discovery is Google’s recent remarks that you can’t rely on robots.txt to keep a page out of the index. You can see this in one of John Mueller’s replies in Google Groups. We had tried to keep the spiders away from the substantially duplicate pages but the pages were not falling out of the index, at least at the expected speed.

At large sites one can’t make substantial or radical changes overnight, which makes SEO at large sites more challenging.

Reply

April 6, 2010 at 7:17pm

@Richard – In an interview with Matt Cutts by Eric Enge, Matt mentions that if Googlebot sees lots of duplicate content on a site, then it may not do a deep crawl of that site.

Reply

April 8, 2010 at 10:44am

@Ben – I had read the interview with Mr. Cutts and this is what I was alluding to. The impression I get is that crawl depth is less a ‘site-wide’ thing as a page based thing i.e. the decision/frequency to crawl deeper from any one page will be dependent on the page rank of that page. I agree that if you have a lot of duplicate content sub-pages these will be low PR and are unlikely to get crawled often but my experience does not indicate that having these will have any negative impact on the crawl rate of the pages closer to the root. So what I’m saying is from a site perspective, a shallow crawl of a deep site with lots of product pages may be no worst and possibly better than a deep crawl of a shallow site without the extra pages.

However if you also have a few yet deeper pages that you really want indexed then I can see there may be a reason for the ‘sculpting’ system you outline above.

Reply

April 10, 2010 at 2:56pm

Sitemap is gooood !!!

Reply

April 17, 2010 at 6:25am

The impression that crawl depth was correlated with page rank so if you have a PR4. The impression I get is that crawl depth is less a ’site-wide’ thing as a page based thing i.e. the decision/frequency to crawl deeper from any one page will be dependent on the page rank of that page.

Reply

May 7, 2010 at 11:47am

Yes, nice article. I definitely will try this for some of my websites. Thank you.

Reply

August 27, 2010 at 4:43pm

Hi, Really your tips is useful.

Reply

September 13, 2010 at 9:54pm

Generally how big does the site have to be to require multiple site maps? Thanks from Steve,
@tigervinci
We handle Web design in Bellevue
sorry the last post links didn’t work

Reply

September 20, 2010 at 10:10am

Steve, when the number of URLs reaches 50,000 or the file size of the sitemap.xml file reaches 10 MB, you should start your next one. http://www.sitemaps.org/faq.php#faq_sitemap_size

Reply

October 20, 2010 at 3:46pm

Very handy information,
Thank you

Reply

October 20, 2010 at 3:46pm

Quite handy,
thanks

Reply

May 19, 2011 at 8:20am

hi Patric

good contribution and keep contributing with us many more new topics. Can you find out for me any tool or software that creates multiple sitemaps.

Reply

October 18, 2011 at 3:37pm

That’s a great post Patrick which is still very useful.

There is an interesting post on seomoz arguing that multiple XML sitemaps increased indexation, hence traffic and a very useful one on distilled.

However, I’ve found that this post is more inspiring and complete. keep up the good work with more awesome posts like this one.

Reply

6 trackbacks

Leave a comment

Your email address will not be published. Fields marked with an asterisk are required.
 

  *

  *

You can use one of the following tags:
<a href=""><blockquote><code><em><strike><strong>