general

Website Migration SEO Checklist: Sitemap and Robots.txt Checks Before Launch

Collin D Johnson
Website Migration SEO Checklist: Sitemap and Robots.txt Checks Before Launch

Why sitemap and robots.txt checks matter during a migration

Google can usually discover pages through links, but migrations create a temporary discovery problem.

URLs change. Templates change. Internal links move. Old paths redirect. New pages launch. Some pages should be indexed. Some should not. The sitemap and robots.txt file tell crawlers where to spend attention and where not to waste it.

They do not fix a weak site structure. They do not replace internal linking. They do not make irrelevant pages valuable.

They do help Google crawl the new site efficiently while it is comparing old URLs, new URLs, redirects, canonicals, and page content.

Google’s site move guidance is clear on the bigger point: rankings can fluctuate while Google recrawls and reindexes a migrated site. A medium-sized site can take a few weeks for most pages to move in Google’s index, and larger sites can take longer. The goal is not to avoid every fluctuation. The goal is to avoid preventable confusion.

The migration SEO checklist

Use this before launch, on launch day, and again after Google has had time to crawl the new site.

CheckWhat to verifyWhy it matters
XML sitemapOnly canonical, indexable URLs are includedPrevents Google from wasting crawl attention on dead or duplicate URLs
Robots.txtImportant sections are not blockedA single staging rule can keep production pages from being crawled
Redirect mapEvery valuable old URL has a relevant destinationPreserves user paths and helps search engines understand the move
CanonicalsCanonical tags point to the final preferred URLsAvoids mixed signals between old paths, new paths, and duplicate templates
Internal linksNavigation and body links point to final URLsSitemaps help discovery, but internal links still carry context
Status codesLive pages return 200, redirects return 301 or 308, removed pages return 404 or 410Search engines need honest responses
Search ConsoleProperties, sitemaps, and coverage monitoring are readyYou need early evidence, not postmortem archaeology

1. Generate the XML sitemap from production routes, not hopes

A sitemap should reflect the site you actually want indexed.

That sounds obvious until a migration ships with:

  • staging URLs in the sitemap;
  • redirected legacy URLs still listed;
  • filtered CMS preview pages included;
  • gated resources mixed with public pages;
  • hundreds of thin tag pages created by a new CMS;
  • missing high-value pages because they were built outside the main route tree.

For a custom design and development project, the sitemap should be generated from the same source of truth as the production routing layer. If the CMS controls blog posts, case studies, and landing pages, those collections should feed the sitemap. If the application has static marketing routes, those routes should be represented too.

A clean sitemap entry should usually be:

  • live;
  • canonical;
  • indexable;
  • internally linked;
  • useful enough to earn a place in search.

Do not include URLs just because they exist. Include URLs because they should be found.

2. Remove staging crawl blocks before launch

This is the classic migration mistake because it is so easy to miss.

During development, teams often block staging with robots.txt or noindex tags. That is fine. The problem is copying those settings into production.

Before launch, check:

  • robots.txt does not include Disallow: / for production;
  • key page groups are not blocked accidentally;
  • CSS, JavaScript, and image assets required for rendering are crawlable;
  • page-level noindex tags have been removed from production pages that should rank;
  • environment-specific headers are not sending X-Robots-Tag: noindex.

Robots.txt is mainly a crawler traffic control file. It is not a reliable privacy or deindexation system. If a page must not appear in Google, use noindex, authentication, or remove the page. Blocking a URL in robots.txt can prevent Google from crawling the page while the URL may still appear if other sites link to it.

That distinction matters during migrations. If you block the wrong pages, you may stop Google from seeing the very signals it needs to process the move.

3. Map redirects by intent, not convenience

A migration redirect map is not a spreadsheet chore. It is the contract between old demand and new structure.

Every valuable old URL should point to the most relevant new URL. Not the homepage. Not the nearest category if a better match exists. Not a generic landing page because it was faster.

Prioritize:

  1. revenue pages;
  2. pages with backlinks;
  3. pages with organic traffic;
  4. pages used in sales, onboarding, or support;
  5. content that still matches current positioning.

If a page is genuinely gone and there is no relevant replacement, return a proper 404 or 410. Redirecting deleted content to an unrelated page creates a worse experience and a weaker search signal.

For full-stack launches, redirects should be tested at the infrastructure level, not only in the CMS. That means checking edge middleware, hosting platform rules, application redirects, trailing slash behavior, lowercase rules, and HTTP-to-HTTPS behavior together.

4. Check canonical tags against the new URL model

Canonical tags are supposed to reduce ambiguity. During migrations, they often create it.

Common problems include:

  • new pages canonicalizing back to old URLs;
  • HTTP canonicals on HTTPS pages;
  • staging domains left in templates;
  • duplicate slash and non-slash versions;
  • paginated or filtered pages using the wrong canonical logic;
  • CMS previews leaking canonical tags into production.

The fix is boring, which is why it works: crawl the site before launch and compare the canonical URL for each page against the intended production URL.

If a page is indexable and important, its canonical should usually point to itself unless there is a deliberate consolidation strategy.

A redirect can save a user. It should not become your internal linking strategy.

After launch, internal links should point directly to final URLs. Navigation, footer links, blog links, case study links, CTAs, schema URLs, XML sitemap entries, hreflang tags, and canonical tags should all agree.

This is especially important on sites with a custom CMS model. A migration may preserve content while changing how relationships are stored. If internal links are embedded manually in rich text, old paths can survive long after the templates are fixed.

Run a crawl and look for:

  • internal 3xx links;
  • internal 404s;
  • links to staging domains;
  • mixed HTTP and HTTPS links;
  • old CMS paths;
  • orphaned important pages.

Search engines can follow redirects. Humans can too. But every unnecessary redirect adds delay, noise, and another place for a launch bug to hide.

6. Submit the sitemap and monitor Search Console

Do not wait for traffic to drop before opening Search Console.

Before launch, verify the relevant properties. For a domain move, verify both old and new properties. For a protocol, hostname, or path migration, make sure the property setup matches how the site will actually resolve.

On launch day:

  • submit the new XML sitemap;
  • inspect a few high-value URLs;
  • confirm Google can fetch rendered pages;
  • watch indexing and crawl errors;
  • monitor old URLs and new URLs, not just total traffic.

After launch, review the sitemap report and indexing data. You are looking for patterns, not isolated noise.

A few temporary fluctuations are normal. A whole template group blocked by robots.txt is not.

7. Treat technical SEO as part of QA, not a final polish pass

The right time to catch migration SEO issues is before the launch window.

That means technical SEO checks should sit inside the same QA process as design review, form testing, accessibility, analytics, performance, and CMS editing. If sitemap generation, robots.txt, redirects, metadata, and structured data are handled by different people in different tools, somebody needs to own the combined release checklist.

For Virdis, this is why migration planning belongs inside the full-stack build, not bolted on after development. The content model, routes, CMS fields, frontend templates, hosting rules, analytics, and Search Console validation all affect each other.

A clean launch is not an accident. It is the result of fewer disconnected decisions.

Launch-day spot checks

Before DNS or production traffic moves, check these manually:

  • Visit /robots.txt on the final production domain.
  • Visit /sitemap.xml and confirm it returns a valid XML response.
  • Open several sitemap URLs and confirm they return 200.
  • Test old high-value URLs and confirm they redirect to the right new pages.
  • Confirm the homepage, services pages, case studies, blog posts, and conversion pages are indexable.
  • Confirm forms, analytics, and consent tooling work after redirects.
  • Confirm no production page points to staging in canonical tags, links, metadata, or assets.

Then repeat the checks after deployment. Preview environments lie politely. Production tells the truth.

What to do if traffic drops after migration

Do not start changing everything at once.

First, isolate the failure mode:

  • Did indexed URLs fall, or did rankings move?
  • Are important pages blocked by robots.txt?
  • Did canonical tags point to the wrong URLs?
  • Are redirects missing, chained, or irrelevant?
  • Did the sitemap submit successfully?
  • Are internal links still pointing to old paths?
  • Did analytics tracking change during the launch?

If tracking changed, separate measurement loss from organic loss. If redirects failed, fix those before rewriting content. If templates are blocked or noindexed, remove the block and request reprocessing on the most important URLs.

The fastest recovery usually comes from fixing the specific technical break, not from publishing a panic blog post about the keyword that dipped.

Frequently asked questions

Should every page be in the XML sitemap?

No. The sitemap should include canonical, indexable URLs that matter. Exclude redirects, duplicate filters, thin archives, gated pages, staging URLs, and pages you do not want search engines to prioritize.

Can robots.txt keep private pages out of Google?

No. Robots.txt manages crawler access, but it is not a privacy system. Use authentication, noindex where appropriate, or remove the page entirely if it should not appear in search.

When should we submit a new sitemap after a migration?

Submit it on launch day after the production URLs, redirects, canonicals, and robots.txt file are live. Then monitor Search Console for sitemap processing, indexing issues, and crawl errors.

How long does SEO recovery take after a website migration?

It depends on site size and crawl frequency. Google says medium-sized moves can take a few weeks for most pages to move in the index, while larger sites can take longer.

No. Redirects protect old paths, but internal links should point directly to final URLs after launch. Clean internal links reduce crawl waste and make the new structure clearer.

FAQ

Common questions.

Everything you need to know about working with us. Can't find what you're looking for?

Ask us directly

Find the 3 leaks most likely to cost you demos.

A 48-hour conversion teardown before you commit
Clear scope, timeline, and next-step plan
Design, development, and CRO handled for you