
Start with the launch URL inventory
Do not start by opening the XML file. Start with the URL inventory.
Your team needs one source of truth for every URL that should exist at launch:
- primary marketing pages
- product or service pages
- blog posts
- case studies with approved proof
- legal pages
- gated or ungated resource pages
- utility pages that deserve indexing
Add the intended canonical URL for each page. Then mark whether each URL should appear in the sitemap.
This catches the first common problem: the sitemap includes whatever the CMS or framework can find, not what the business wants indexed.
A launch sitemap should include canonical pages you want Google to index. It should exclude drafts, filtered views, test pages, thank-you pages, duplicate variants, internal search results, and low-value archive pages.
If a page helps a qualified buyer understand the offer, compare options, evaluate cost, or trust the company, keep it in the review set. If the page exists because a plugin generated it, question it.
Match sitemap URLs to production routes
Next, compare each sitemap URL against the production routing layer.
For a custom site, this means checking the routes your frontend serves and the entries your CMS publishes. The sitemap should not depend on a manual list someone forgets after launch.
Check for:
- HTTP 200 responses on every sitemap URL
- no staging, preview, or localhost URLs
- no redirected URLs inside the sitemap
- no uppercase or lowercase mismatches
- no trailing slash inconsistencies
- no URL parameters unless the page needs them indexed
A sitemap can contain a URL that resolves, yet still points at the wrong version of the page. That creates crawl waste and weakens the canonical signal your site sends.
The fix is not to patch the XML by hand. Fix the route source, CMS query, or sitemap generator so the same mistake does not return next month.
Check robots.txt before you submit anything
Robots.txt and sitemap files often live next to each other, but teams treat them like separate chores. They are not.
Before you submit a sitemap, check whether robots.txt blocks any path that the sitemap includes.
Look for rules that block:
- /blog/
- /resources/
- /case-studies/
- /api/ paths used by rendered content
- image or asset paths required for page rendering
- an entire staging pattern copied into production
Robots.txt tells crawlers where they can go. It does not remove a URL from Google by itself, and it does not replace canonical tags or noindex rules. Use it for crawl access, not as a cleanup drawer.
If your sitemap asks Google to crawl a URL and robots.txt blocks that URL, your launch process has a contradiction. Resolve it before launch, not after Search Console starts reporting noise.
Confirm canonicals agree with the sitemap
Every indexable page should point at itself with a canonical tag unless you have a specific reason to consolidate signals elsewhere.
During the sitemap audit, open a sample from each template type:
- homepage
- service page
- product page
- blog post
- case study
- resource page
- legal page
Check the canonical tag on each page. It should match the URL pattern in the sitemap.
Watch for canonicals that still point to the old domain, staging domain, HTTP version, default CMS URL, or a different slug. These mistakes travel well because templates repeat them across the site.
A custom site gives you the chance to wire canonicals into the routing and metadata system. Use that. Do not let each page invent its own answer.
Remove redirected and missing URLs
A sitemap should not act like a redirect map.
If a URL redirects, the sitemap should list the final destination, not the old URL. Search engines can follow redirects, but you make them do extra work and send a weaker signal than necessary.
Run the sitemap through a crawler or validator and flag:
- 301 or 302 responses
- 404 pages
- soft 404 pages
- blocked URLs
- server errors
- redirect chains
For a redesign or CMS migration, keep the redirect map separate. Old URLs still need redirects, but they do not belong in the new sitemap unless they remain canonical pages.
This matters for maintainability too. If your launch sitemap contains old routes, someone will trust that file in the next release and repeat the mistake.
Audit CMS collections before launch
Most sitemap mistakes on custom sites start in the CMS model.
The site may generate a sitemap from every published entry in a collection. That works only when the CMS separates public, indexable content from internal or incomplete content.
Before launch, check each collection that feeds the sitemap:
- Does the entry have a real title and meta description?
- Does the slug match the approved URL plan?
- Does the entry have a publish date if the template expects one?
- Does the entry use the right author and category references?
- Does the entry have a canonical URL if the system supports it?
- Does the frontend hide draft or archived states from the sitemap query?
A good CMS model helps non-technical teams publish without leaking unfinished content into search. A weak model leaves every launch dependent on cleanup.
For Virdis projects, this is one reason structure matters. The website should not need a developer every time the marketing team publishes a post, but it should still protect the parts that affect SEO and conversion.
Check sitemap size and structure
Small sites can use one sitemap. Larger sites often need a sitemap index that points to separate files for pages, posts, resources, products, or other collections.
The structure should help crawlers and humans understand the site. It should also help your team debug problems.
A clean setup might separate:
- static marketing routes
- blog posts
- case studies
- resource pages
- product or service pages
This makes post-launch checks easier. If blog posts stop appearing, you know where to look. If the marketing routes work but CMS collections fail, you can isolate the issue fast.
Do not split files for decoration. Split them when the site structure, publishing flow, or future maintenance needs it.
Submit after deployment, then monitor Search Console
Submit the sitemap after the production deployment is live and the final domain resolves.
Before submission, open the sitemap URL in the browser and confirm the production version. Then submit it in Google Search Console.
After launch, monitor:
- sitemap fetch status
- discovered URLs
- indexed pages
- crawl errors
- excluded pages that should be indexed
- indexed pages that should not exist
Search Console will not validate your whole strategy. It will show you where Google sees a mismatch. Treat those reports as launch QA, not as a monthly report someone checks after the damage settles in.
The sitemap audit checklist
Use this before a custom website launch:
- Confirm the approved launch URL inventory.
- Mark which URLs should be indexable.
- Generate the sitemap from production routes and CMS entries.
- Remove staging, preview, draft, test, and parameter URLs.
- Confirm every sitemap URL returns HTTP 200.
- Replace redirected URLs with their final canonical destinations.
- Remove 404, blocked, and soft 404 URLs.
- Check robots.txt against every sitemap path.
- Confirm canonical tags match the sitemap URL pattern.
- Sample every major template type.
- Check CMS collections for draft leakage and missing fields.
- Validate sitemap syntax.
- Submit the production sitemap in Search Console.
- Monitor coverage and indexing after launch.
Where this fits in a custom website project
A sitemap audit belongs near the end of the build, but earlier decisions shape the work.
If the team has no URL inventory, the sitemap audit turns into archaeology. If the CMS model has weak publish controls, the sitemap audit turns into cleanup. If the routing layer has no clear canonical logic, the sitemap audit turns into debate.
The better path: plan URL structure, CMS fields, metadata rules, and sitemap generation before content entry starts.
That is the difference between a custom website that only looks finished and a custom website your team can keep using after launch.
