Sitemaps

Last updated 13 May 2026 2 min

A sitemap is a file that lists the URLs on a website to help search engines discover and crawl them efficiently. The most common format is the XML sitemap, which follows the sitemaps.org protocol and is read by Google, Bing, and other crawlers.

What an XML sitemap contains

Each URL entry can include:

  • <loc> — the canonical, absolute URL (required).
  • <lastmod> — when the page last meaningfully changed. Google uses this as a hint for recrawl priority, so it should be accurate.
  • <changefreq> and <priority> — These were once useful, but often abused by website operators setting artificially high values in an attempt to game Google search results. As a result, these are largely ignored by Google today, and are safe to omit.

A single sitemap can hold up to 50,000 URLs or 50 MB uncompressed. Larger sites split content across multiple sitemaps and reference them from a sitemap index file.

A sitemap_index.xml file might point to:

  • sitemap_posts.xml
  • sitemap_categories.xml
  • sitemap_pages.xml

And so on, to split the content listings into more manageable, and better grouped lists.

Specialised sitemaps

  • Image and video sitemaps — surface media that may not be discoverable through HTML alone.
  • News sitemaps — for sites approved in Google News, with a 48-hour article window.
  • Hreflang in sitemaps — an alternative to on-page hreflang tags for multilingual sites.

How search engines find your sitemap

  • Reference it in robots.txt with a line like Sitemap: https://example.com/sitemap.xml.
  • Submit the sitemap URL in Google Search Console and Bing Webmaster Tools.

While pointing to your sitemap from a site's robots.txt file is enough for its discovery, it is recommended to add sitemaps to Search Console, as this provides additional indexing and error information, and can alert you to issues that might otherwise go unnoticed.

Best practices

  • Include only canonical, indexable URLs — no redirects, noindex pages, parameter duplicates, or 404s. A sitemap full of non-indexable URLs sends mixed signals.
  • Keep lastmod truthful.
  • Generate sitemaps dynamically from your CMS so they stay in sync with the live site.
  • Split by content type (e.g. /sitemap-products.xml, /sitemap-blog.xml) — this makes the GSC coverage report far easier to diagnose.

Why sitemaps matter

Sitemaps don't guarantee indexing or improve rankings directly, but they help search engines discover new and updated content faster — especially on large sites, or sites with weak internal linking.

Disclaimer: All information contained herein is for informational purposes only. It is not advice or instructional.