SEO Tips About XML Sitemaps
In all of my years of SEO advising, I’ve seen many clients who have misunderstood XML sitemaps. They’re a formidable instrument, to be sure, but like with any powerful tool, a little training and understanding of how all the parts function may go a long way.
The most widespread misconceptions is that a XML sitemap aids in the indexing of your pages. First and foremost, let’s get one thing straight: Google does not index your pages simply because you asked nicely. Google indexes pages because (a) they discovered and crawled them, and (b) they believe they are of sufficient quality to be indexed.
It’s vital to remember that by submitting an XML sitemap to Google Search Console, you’re telling Google that the sites in the sitemap are high-quality search landing pages that deserve to be indexed. But, like connecting to a page from your main menu, it’s only a hint that the pages are significant.
One common error that I often see clients make is to send inconsistent messages to Google regarding a specific page. You’re a tease if you use robots.txt to restrict a page and then include it in an XML sitemap. If a page isn’t included in an XML sitemap, set the meta robots to “noindex,follow.”
In general, you want every page on your site to fit into one of these categories:
- Should be either robots.txt-blocked or meta robots “noindex,follow”-blocked, and should not be in an XML sitemap.
- Should be in an XML sitemap, should not be prohibited in robots.txt, and should not have meta robots “noindex.”
Site Quality As a Whole
It appears that Google is measuring entire site quality and utilizing that site-wide parameter to influence ranking.
Consider this from the standpoint of Google. Let’s imagine you have one fantastic page with fantastic material that checks all the boxes, but the rest of the site is lacking in quality content. If Google sees your site as 1,000 pages of information, but only 5–6 of those pages are like this one excellent page…then understandably, they wouldn’t want to direct a user to your site.
Every site includes a certain amount of “utility” pages that are beneficial to visitors but aren’t necessarily content-type pages that should be search landing pages: pages for sharing material with others, replying to comments, checking in, and so on.
What are you communicating to Google if your XML sitemap covers all of these pages?
Basically, you have no idea what makes for good content on your site and what doesn’t.
Instead, here’s the image you’d like to present to Google. Yes, we have a 1,000-page website here… And here are the 475 amazing content pages from the 1,000 total. The others can be ignored; they’re just utility pages.
Let’s imagine Google crawls those 475 pages and determines that 175 of them are “A” grade, 200 are “B+,” and 100 are “B” or “B-” based on their metrics. That’s a really strong overall average, and it generally means you’re sending users to a reputable site.
Compare this to a site that uses an XML sitemap to send all 1,000 pages. Now, Google examines the 1,000 pages you claim to have good content and discovers that more than half are “D” or “F” pages. Your site is, on average, fairly bad; Google is unlikely to send consumers to a site like that.
Remember that Google will utilize the information you provide in your XML sitemap as a guide to what’s most likely significant on your site. However, just because a page isn’t listed in your XML sitemap doesn’t imply Google will disregard it. There may still be hundreds of pages with insufficient substance and link equity to be indexed.
It’s critical to perform a site search to examine all of the pages that Google is indexing from your site in order to find pages that you may have forgotten about and need to remove. In most cases, the poorest pages that made the index will appear last in a site search.
Robots.txt Verses Noindex
When it comes to preventing indexation of a page, there is a significant yet slight difference between utilizing meta robots and employing robots.txt.
The use of the meta tag “noindex,follow” allows the link equity generated by that page to trickle out to the pages to which it connects. You’re basically pouring money down the toilet if you use robots.txt to block a page.
Management of Crawl Bandwidth
When would it be appropriate to use robots.txt instead?
Perhaps if you’re having crawl bandwidth concerns and Googlebot is spending a lot of time fetching useful pages just to find meta robots “noindex,follow” in them, forcing Googlebot to abandon the task. If you have so many of these that Googlebot can’t find your quality content pages, you may need to use robots.txt to block them.
Cleaning up XML sitemaps and noindexing utility pages has helped a number of clients improve their rankings across the board.
If you have a core set of pages with frequently changing content (such as a blog, new products, or product category pages) and a lot of pages (such as single product pages) that you’d like Google to index, but not at the expense of not indexing and crawling the core pages. You can submit ONLY the core pages in the XML sitemap, to give Google a hint that you value them more, and then leave the less important pages out, but without blocking them.
Debugging an Indexing Issue
When you submit a bunch of pages to Google for indexing, but only a few of them are indexed, the XML sitemap comes in handy. Google Search Console shows you the total number of pages indexed in each XML sitemap, not which pages they’re indexing.
Assume you have 100,000 product pages, 5,000 category pages, and 20,000 subcategory pages on your e-commerce site. After submitting your 125,000-page XML sitemap, you discover that only 87,000 of them have been indexed by Google. But which of the 87,000 is it?
To begin with, your category and subcategory pages are most likely your most essential search objectives.
I’d make a category-sitemap.xml and a subcategory-sitemap.xml for each category and submit them separately.
You should see near-complete indexation there, and if you don’t, you know you need to look into adding additional content to those pages, increasing link juice to them, or both.
You might find that product category or subcategory pages aren’t being indexed because they just have one product (or none at all), in which case you should definitely set meta robots “noindex,follow” on those pages and pull them from the XML sitemap.
Start by dividing your product pages into different XML sitemaps by category; you can do multiple at once; there’s nothing wrong with a URL appearing in numerous sitemaps.
Make an XML sitemap with a significant number of pages in each category. It doesn’t have to be all pages in that category; just enough to form a judgment based on the indexation given the sample size.
The purpose is to use the overall % indexation of any given sitemap to determine attributes of pages that cause them to be indexed or not.
You can either alter the page content (or links to the pages) or noindex the pages once you’ve identified the issue. For example, the product description on 20,000 of your 100,000 product pages can be less than 50 words. If these aren’t high-volume terms and you’re getting descriptions from a manufacturer’s feed, it’s probably not worth your time to create an additional 200 words of description for each of the 20,000 pages yourself. Because Google isn’t going to index pages with less than 50 words of product description anyhow. You should also set meta robots to “noindex,follow” for all pages with less than 50 words of product description. And don’t forget to take them out of your XML sitemap!
XML Sitemaps That are Updated on a Regular Basis
However, there’s no need to do it by hand. Static files aren’t required for XML sitemaps. To submit them to Google Search Console, they don’t even need to have a.XML extension.
Just set up guidelines for whether or not a page should be included in the XML sitemap, and use the same logic to set the meta robots index or noindex on the page itself.
Points to Remember
- If it’s blocked in robots.txt or by meta robots “noindex,” it’s best not to include it in your XML sitemap.
- Use your XML sitemaps as investigative tools to find and fix indexation issues, and only allow/ask Google to index pages that you know Google will want to index.
- If you have a large site, use dynamic XML sitemaps rather than manually keeping robots.txt, meta robots, and XML sitemaps in sync.
Thank you for stopping by today. If you enjoyed this article you may also like: 3 Simple Ways to Speed Up Your Website
What’s Your SEO Score?
Enter the URL of any landing page or blog article and see how optimized it is for one keyword or phrase.