Pre-launch Indexability Checklist (Robots, Noindex, X-Robots-Tag)
Informational article in the How to Consolidate Duplicate Content During a Migration topical map — Migration Implementation & QA content group. 12 copy-paste AI prompts for ChatGPT, Claude & Gemini covering SEO outline, body writing, meta tags, internal links, and Twitter/X & LinkedIn posts.
Pre-launch Indexability Checklist confirms that robots, meta noindex, and X-Robots-Tag settings do not prevent intended pages from being crawled or indexed, and verifies robots.txt rules against the Robots Exclusion Protocol (REP, 1994). The checklist inspects three layers—robots.txt at the site root, HTTP header X-Robots-Tag for non-HTML resources, and HTML meta robots—for correct directives, response codes and canonical behavior. Typical verifications include a live curl -I header check, Google Search Console URL Inspection for rendering and indexability status, and a crawl simulation to ensure consolidated URLs are reachable before launch. The objective is to avoid accidental suppression of migrated or consolidated content.
Mechanically, indexability works because robots.txt implements the Robots Exclusion Protocol by instructing bots which paths to avoid, while meta robots and X-Robots-Tag communicate indexation tags directly on resource fetch. Tools like Screaming Frog and Google Search Console reveal crawlability and meta/header signals, and HTTP tooling such as curl or wget validates server-side X-Robots-Tag responses. For migration-focused QA this crawlability checklist should include fetching the root /robots.txt to confirm directives and testing canonical chains, redirects, and status codes so that consolidated pages are not blocked. The duplicate-content migration workflow benefits from staging checks that mirror production hostnames and from recording differences between live headers and the pre-launch robots configuration. Checks should also include user-agent rules and basic sitemap alignment.
An important nuance is that blocking with robots.txt is not equivalent to deindexing: search engines may index a URL without crawling it if external links point to it, so relying solely on robots.txt can leave obsolete or duplicate URLs visible. Equally problematic is combining disallow in robots.txt with meta noindex on the same URL, because disallow prevents the crawler from fetching the page and thus prevents discovery of the noindex directive. For non-HTML assets such as PDFs, images, and attachments the X-Robots-Tag header is required to control indexation; forgotten X-Robots-Tag rules commonly cause legacy files to remain indexed after a relaunch. In an ecommerce category consolidation this can surface as product pages indexed under old faceted URLs despite canonical and redirect plans. A pre-launch audit should document header evidence.
The practical takeaway is to verify three control points during pre-launch: accessible /robots.txt directives, visible meta robots or X-Robots-Tag responses for pages and non-HTML assets, and live-crawl confirmation that canonical and redirect targets are fetchable and return intended status codes. Execution can use automated runs in Screaming Frog or site crawlers, scripted curl -I header checks for batches of URLs, and Google Search Console index status reports to reconcile discrepancies. Recording the pre-launch state and re-checking immediately after switch minimizes regressions in site migration indexability. This page contains a structured, step-by-step framework.
- Work through prompts in order — each builds on the last.
- Click any prompt card to expand it, then click Copy Prompt.
- Paste into Claude, ChatGPT, or any AI chat. No editing needed.
- For prompts marked "paste prior output", paste the AI response from the previous step first.
prelaunch indexability checklist
Pre-launch Indexability Checklist
authoritative, practical, evidence-based
Migration Implementation & QA
Technical SEOs, content & migration managers, and developers preparing a site migration or major relaunch; intermediate to advanced knowledge; goal is to prevent indexability issues before launch
A concise, prescriptive pre-launch checklist focused specifically on Robots, meta noindex, and X-Robots-Tag issues within the broader duplicate-content migration workflow, with actionable verifications, tooling commands, and edge-case rules for e-commerce and multilingual sites
- robots.txt
- noindex
- X-Robots-Tag
- crawlability checklist
- pre-launch SEO checklist
- indexation tags
- site migration indexability
- Relying solely on robots.txt disallow to prevent indexing — robots.txt can block crawling but not indexing if other sites link to the pages.
- Confusing noindex with disallow — placing both can lead to search engines never fetching the X-Robots-Tag or meta noindex directives.
- Forgetting X-Robots-Tag for non-HTML resources (PDFs, images, attachments) so these assets remain indexed after migration.
- Leaving staging or dev environments crawlable because robots.txt is not properly configured or is ignored by developers.
- Not testing headers with curl or an HTTP inspector; authors assume meta tags exist but forget server-side header overrides (CDN, proxy).
- Neglecting hreflang and canonical interactions with noindex and X-Robots-Tag for multilingual sites, causing loss of preferred-language pages.
- Using wildcard robots.txt rules that unintentionally block whole directories (e.g., /product/ vs /product-images/).
- When testing X-Robots-Tag, always use curl -I to inspect the raw header from the production origin (not via CDN) and include example commands in the article: curl -I https://example.com/page | grep -i x-robots-tag.
- Recommend a two-step pre-launch verification: automated crawler run (Screaming Frog or Sitebulb) plus manual header checks for a sample of 50 priority URLs (homepage, category pages, top 10 product pages, canonical targets).
- Advise adding temporary server-side logging to capture 200 responses for pages expected to be noindexed to confirm search engine agents can fetch the page and see the directive before launch.
- For e-commerce faceted navigation, suggest using canonical+noindex for parameterized pages and provide exact robots.txt and X-Robots-Tag examples instead of vague guidance.
- Include a rollback playbook section: if a noindex is accidentally left on live pages, the fastest recovery is to remove the noindex and force a re-crawl via Search Console URL Inspection for priority pages.
- Prefer X-Robots-Tag over meta noindex for non-HTML assets and include a short server configuration example (Apache header set, Nginx add_header) tailored to common setups.
- Use the site: operator plus a targeted site:example.com in Google to quickly validate whether critical sections are indexed post-launch, then track with Search Console coverage and index status API for larger sites.