robots.txt Disallow vs noindex: What Is the Difference?
Disallow in robots.txt tells compliant crawlers not to crawl a URL path. noindex tells search engines not to keep a page in the index. They are related, but they are not the same.
The practical rule: use robots.txt to manage crawling, and use noindex when you need a page removed from search results. Do not rely on Disallow alone for deindexing.
BaseToolbox's robots.txt generator can help draft crawl rules, but indexing decisions need the right mechanism for the goal.
What Disallow Does
A robots.txt rule can look like this:
User-agent: *
Disallow: /private-drafts/
This asks crawlers not to fetch URLs under /private-drafts/. It is useful for reducing crawl of low-value paths, duplicate filters, staging-like folders, or generated pages you do not want crawled.
But if another page links to a disallowed URL, a search engine may still know the URL exists. It may show the URL without crawling the content.
What noindex Does
noindex is an indexing directive. It can be placed in a meta robots tag or HTTP header. It tells search engines not to include the page in search results.
For example:
noindex, follow
This means the page should not be indexed, but links on it may still be followed.
For noindex to be seen, the crawler usually needs to access the page. If you block the page in robots.txt, the crawler may never see the noindex directive.
Which One Should You Use?
| Goal | Better choice |
|---|---|
| Stop crawling a folder | robots.txt Disallow |
| Remove a page from results | noindex |
| Keep a private page private | Authentication, not robots.txt |
| Reduce crawl waste | robots.txt and internal link cleanup |
| Hide staging content | Password protection or noindex plus access control |
Robots rules are public. Never put sensitive paths in robots.txt and assume they are hidden.
Common Mistake
The common mistake is blocking a URL and expecting it to disappear from search. If the URL is already indexed, blocking crawl can prevent the crawler from seeing the noindex you later add.
A safer deindexing sequence is often:
- Allow the crawler to access the page.
- Add
noindex. - Wait for recrawl and removal.
- Then decide whether crawl blocking is still needed.
For urgent removals, use the search engine's removal tools where appropriate.
AI Crawler Note
AI crawlers also read robots rules differently depending on the platform. If your goal is AI citation, blocking relevant pages can reduce citation opportunities. If your goal is preventing training or access, document the business choice clearly.
Use separate rules for different bots when needed, and review them whenever your content strategy changes.
Private Content Needs Access Control
Neither Disallow nor noindex is a security feature. If a page must be private, protect it with authentication, authorization, or remove it from the public web.
This matters for staging sites, draft reports, internal tools, customer files, and admin paths. A URL blocked in robots.txt can still be guessed, linked, logged, or opened by someone with the address.
Think of robots rules as crawler instructions, not locks.
Audit Questions
Ask these before changing rules:
- Do we want to stop crawl or stop indexing?
- Is the page already indexed?
- Does the crawler need to see noindex?
- Are internal links still pointing to blocked paths?
- Is any sensitive URL being advertised in robots.txt?
The right answer depends on the page's current state, not only the desired state.
Example Scenarios
For faceted search pages that waste crawl budget, Disallow may be appropriate. For an old thank-you page that appears in search results, noindex is the better tool.
For a staging site, neither is enough by itself. Use authentication or IP restrictions first, then add noindex as a backup if the pages are reachable.
For deleted content, return the right HTTP status or redirect intentionally. Robots rules are not a cleanup plan for broken URLs.
FAQ
Does Disallow remove a page from Google?
Not reliably. It blocks crawling, not indexing. Use noindex for removal from results.
Is robots.txt private?
No. It is public. Anyone can visit /robots.txt.
Can I use both Disallow and noindex?
Sometimes, but be careful. If Disallow prevents crawling, the crawler may not see noindex.
Ready to try it yourself?
Put what you have learned into practice with our free online tool.
Create robots.txt Rules