Page indexing

Last updated 18 June 2026 4 min

Indexing is the process by which search engines store and organise web pages so they can be retrieved in response to search queries. A page that is indexed can appear in search results. A page that is not indexed cannot — no matter how good its content or how many backlinks it has.

The indexing process

Google and other search engines process pages in three stages:

1. Crawling

A bot (Googlebot) discovers and fetches the URL. This discovery happens through:

  • Internal links from already-known pages.
  • External backlinks.
  • XML sitemaps.
  • Direct submissions via Search Console.
  • Redirects from other URLs.

2. Rendering

Google executes the code and any JavaScript and renders the final HTML. This is when user-facing content becomes visible to Google.

3. Indexing

Google analyses the rendered content, decides whether the page meets quality and relevance thresholds, and includes it in the index or doesn't.

A page can be successfully crawled and rendered, but still not indexed if the required standards are not met.

Reasons a desirable page might not be indexed

Technical reasons (common errors, within your control)

  • noindex directive in the page's meta tags or HTTP headers.
  • Blocked by robots.txt — Google can't crawl the page.
  • Canonical URL pointing elsewhere — Google indexes the canonical URL, not this one.
  • Page returns a non-200 status code — 4xx or 5xx errors prevent indexing.
  • Soft 404 — Google decides the page isn't real content even though it returns 200.

Quality and relevance reasons (Google's discretion)

  • "Crawled — currently not indexed" — Google fetched the page but decided not to include it. Common reasons: thin content or duplication.
  • "Discovered — currently not indexed" — Google knows the URL exists but hasn't crawled it yet. Common on large sites where crawl budget is an issue, or on pages with weak internal linking.
  • "Duplicate, Google chose different canonical than user" — Google identified a different page as the canonical version of this content.
  • Manual actions or quality issues affecting the site as a whole.

Getting pages indexed

1. Make sure they're crawlable

  • Allow in robots.txt.
  • Return a 200 status.
  • No noindex directive.
  • No conflicting canonical.
  • Reachable via internal links from already-indexed pages.

2. Make sure they're worth indexing

Google's quality systems lean toward pages that:

  • Have substantive, original content.
  • Cover the topic better than competing pages.
  • Are linked to internally and externally.
  • Show signs of user engagement (view time, return visits).

Pages that don't meet these bars often get classified as "Crawled — currently not indexed" and stay in limbo. The fix is usually to improve the content, increase internal links, or consolidate with better-performing pages.

3. Submit a sitemap to Search Console

An XML sitemap doesn't guarantee indexing, but it does:

  • Surface deep URLs to Google sooner, that otherwise would have taken significantly longer to be discovered.
  • Provide information on update frequency and last modification.

4. Strengthen internal linking

Pages with few internal links are crawled less often and rated lower in importance. Pages with no inbound links are unlikely to be indexed at all

Crawl budget and indexing speed

Google has a finite "crawl budget" — the number of URLs it's willing to fetch in a given timeframe. Wasted crawl budget (on duplicates, parameters, low-quality URLs) means less attention for the URLs that matter.

For small to mid-size sites, crawl budget isn't a real constraint, but it becomes meaningful when a website has thousands of URLs and up.

Common indexing mistakes

  • Staging environments accidentally indexed.
  • Production launched with site-wide noindex still in place from staging.
  • Sitemap full of non-indexable URLs (redirects, 404s, noindex, canonicalised). Causes confusing reports and dilutes signals.
  • Not monitoring indexing changes. A sudden drop in indexed pages usually indicates a recent change (template update, robots.txt edit, migration) that broke something.

Periodic indexing check

For any website that has regular updates, changing products, or ongoing development work, a periodic check should be performed to ensure no individual web pages or entire website sections have accidentally become indexed or deindexed as a result of an update or error.

Disclaimer: All information contained herein is for informational purposes only. It is not advice or instructional.