What is Robots.txt?

Robots.txt is a plain text file placed at the root of a website that tells search engine crawlers and other bots which pages or sections of the site they are allowed to access. It is the first file most crawlers check when they visit a domain. A correctly configured robots.txt file helps search engines focus their crawl attention on the pages that matter most. A misconfigured one can accidentally block Google from crawling important pages, or prevent AI search systems like ChatGPT and Perplexity from accessing content that should be cited in generated answers.

SEOGEO

How does robots.txt work?

Robots.txt works by declaring a set of rules that tell crawlers which parts of a website they can and cannot access. The file sits at the root of the domain, accessible at yourdomain.com/robots.txt, and is checked by most well-behaved crawlers before they begin crawling any other page on the site.

The file uses a simple structure built around two main directives. The User-agent directive specifies which crawler the rule applies to. Using an asterisk applies the rule to all crawlers. Targeting a specific crawler by name, such as Googlebot or OAI-SearchBot, applies the rule only to that bot. The Disallow directive tells the specified crawler which URLs or directories it should not access. An empty Disallow rule means the crawler is allowed to access everything. A Disallow followed by a forward slash blocks the crawler from accessing the entire site.

An important distinction that is frequently misunderstood is that robots.txt controls crawling, not indexing. A page that is blocked in robots.txt cannot be crawled by the specified bot. However, Google can still index a page it has never crawled if external links point to it, meaning a blocked page can still appear in search results with limited information. If the goal is to prevent a page from appearing in search results entirely, a noindex meta tag at the page level is the correct tool, not a robots.txt disallow rule.

On Wix, the robots.txt file is accessible and editable through the SEO dashboard. Wix generates sensible default rules but allows custom additions for site owners who need to block specific URL patterns, manage crawl budget on large sites, or control which AI crawlers can access the site. For the full technical SEO configuration that robots.txt sits within, the Wix technical SEO guide covers the complete setup including canonical tags, sitemap configuration, and indexing settings.

What should you block in robots.txt?

Robots.txt is a tool for directing crawl attention toward valuable pages and away from low-value ones. The most effective configurations block specific categories of URLs that consume crawl budget without contributing to search performance, rather than applying broad rules that risk blocking important content.

Admin and backend URLs are the clearest case for blocking. Pages like login screens, dashboard areas, and internal management tools have no value to search engines and should not appear in search results. Blocking these from crawlers protects both crawl budget and security by preventing crawlers from mapping the site's administrative structure.

URL parameters that generate duplicate or near-duplicate content are the second priority. Tracking parameters, session IDs, sorting options, and pagination variants all create multiple URLs with identical or near-identical content. On ecommerce sites, faceted navigation can generate thousands of unique URLs for filter combinations that serve no SEO purpose. Blocking these URL patterns with robots.txt prevents crawlers from wasting budget on content that will never rank and that may create duplicate content signals if indexed.

Thank you pages, checkout confirmation pages, and internal search result pages fall into the same category. They are useful to visitors but have no ranking value and should be excluded from crawl paths.

What should never be blocked in robots.txt are the pages that need to rank. Service pages, blog posts, product pages, category pages, and the homepage should all be explicitly accessible. The most damaging robots.txt errors in practice are not deliberate over-blocking but accidental ones: a forward slash disallow that blocks the entire site, a directory rule that catches a commercial page path, or a rule added during a migration that was never removed. Checking the robots.txt file after any significant site change, migration, or platform update is a standard step in a technical SEO audit. For Wix-specific configuration guidance, the Wix technical SEO guide covers the correct setup in detail.

How does robots.txt affect AI search visibility?

Robots.txt has become significantly more consequential for AI search visibility than it was three years ago. In the traditional SEO context, the file managed Googlebot access and a handful of secondary search crawlers. In 2026, it is the primary configuration point for controlling whether AI search systems can access a site at all.

Every major AI search platform uses its own crawler. ChatGPT Search uses OAI-SearchBot and GPTBot. Perplexity uses PerplexityBot. Microsoft Copilot uses Bingbot. Google AI Overviews uses Googlebot. Each crawler identifies itself by a distinct user-agent string, which means robots.txt rules can be applied to each one independently. A site that allows Googlebot but blocks OAI-SearchBot will rank in traditional Google search but will not appear in ChatGPT-generated answers. A site with a blanket disallow rule for all non-Googlebot crawlers is invisible to every AI search platform simultaneously.

The most common pattern We Optimizz encounters in AI visibility audits is a robots.txt file configured before AI search crawlers became commercially relevant, which has never been reviewed since. A rule that blocks all bots except Googlebot, or that uses a wildcard disallow pattern that inadvertently catches AI crawler user-agents, can suppress AI search visibility across the entire site without triggering any visible error in Google Search Console. The site ranks in Google. The owner has no idea they are invisible in AI search. The fix takes minutes once the problem is identified.

The strategic question of which AI crawlers to allow is genuinely nuanced. Allowing search-oriented crawlers like OAI-SearchBot and PerplexityBot improves citation eligibility in ChatGPT and Perplexity without providing content for AI model training. Crawlers used primarily for training rather than search retrieval are a separate decision that depends on the business's position on content licensing. For most service businesses and agencies, allowing all major search-oriented AI crawlers and reviewing the configuration quarterly is the practical default. For the full AI visibility diagnostic specific to Wix sites, the Wix AI visibility guide covers the robots.txt check alongside the other crawlability and content structure issues that affect AI citation frequency.

What are the most common robots.txt mistakes?

Robots.txt mistakes fall into two categories: rules that block too much and rules that block too little. Both cause problems, but accidental over-blocking is significantly more damaging because it can suppress organic visibility across entire sections of a site without producing any visible error.

Blocking the entire site is the most severe mistake and more common than most site owners realise. A single line reading Disallow: / under User-agent: * tells every crawler to avoid every page on the domain. This configuration is correct for staging environments and sites under development. It is catastrophic on a live production site. Sites launched from a staging environment without reviewing the robots.txt file have been blocked from Google indexing for weeks or months before the error was discovered.

Directory rules that accidentally catch commercial pages are the second most consistent problem. A rule intended to block /blog/drafts/ that is written as /blog/ blocks the entire blog section. A rule targeting URL parameters that uses a pattern broad enough to match important service page URLs removes those pages from the crawl path. Directory and pattern-based disallow rules need to be tested against the actual site structure before they are deployed.

Outdated rules that are no longer relevant are a lower-severity but widespread issue. A rule added to block a now-deleted section, a URL pattern from a previous platform that no longer exists, or a crawler block from a migration that was never removed all accumulate in robots.txt files on sites that have been live for several years. They rarely cause active damage but they create confusion in audits and may interact unexpectedly with new site sections added after the original rule was written.

Missing sitemap declaration is a less critical but commonly overlooked gap. The robots.txt file can include a line declaring the location of the XML sitemap, which helps crawlers discover it directly without relying on Search Console submission. On Wix, the sitemap is automatically generated and its location is known to Googlebot, but declaring it explicitly in robots.txt is a best practice that costs nothing and supports crawler efficiency. For the full technical configuration that includes robots.txt, sitemap, and canonical setup on Wix, the Wix technical SEO guide covers each element in sequence.

How do you check and test your robots.txt file?

Checking robots.txt is one of the fastest technical SEO checks available — the file is publicly accessible at any domain and readable without any tools or accounts. Testing it correctly requires understanding what each rule does before assuming a configuration is correct.

The first check is simply viewing the file. Navigate to yourdomain.com/robots.txt in a browser. The file loads as plain text and shows every rule currently in effect. A site with no robots.txt file returns a 404 error, which tells crawlers that no restrictions apply and all pages are accessible. That is a valid configuration for most small sites. A site with an empty file or a file containing only a sitemap declaration is also fully accessible to all crawlers.

Google Search Console includes a robots.txt Tester that checks whether specific URLs on the site are allowed or blocked by the current file. The tester shows which rule is responsible for the allow or disallow decision for any given URL, which makes it the most useful diagnostic tool for identifying why a specific page might be excluded from crawl. The Tester also validates the file's syntax, which catches formatting errors that would cause the file to be misread by crawlers.

For AI crawler access specifically, the check is more manual. Search the file for the user-agent strings of the major AI crawlers: OAI-SearchBot, GPTBot, PerplexityBot, and Google-Extended. If any of these appear in disallow rules, or if a blanket wildcard rule blocks all crawlers without explicit allow rules for these agents, AI search visibility is being suppressed. The Wix AI visibility guide covers this check in detail alongside the other configuration steps that affect AI citation frequency.

For sites with large URL structures or complex URL parameter patterns, a site crawler like Screaming Frog can simulate crawl behaviour against the robots.txt rules and identify which pages are being blocked and which are accessible. That simulation is faster and more reliable than manually checking individual URLs for large sites.

When does it make sense to get help with robots.txt?

Robots.txt is a simple file with a straightforward syntax. For most small to medium-sized service business websites, the default configuration on platforms like Wix, Framer, or Shopify is sensible and requires no manual changes. Viewing the file, confirming no important pages are blocked, and verifying AI crawler access are checks that most site owners can complete without specialist help in under fifteen minutes.

Where specialist involvement becomes the rational choice is complexity, scale, and the consequences of getting it wrong. On large ecommerce sites with faceted navigation, URL parameters, and thousands of indexable pages, robots.txt is a meaningful crawl budget management tool. Incorrectly written rules on those sites can suppress entire product categories or create URL patterns that generate crawl waste at scale. The volume makes manual checking impractical and the stakes make errors costly.

Platform migrations are the highest-risk moment for robots.txt problems. A staging environment typically blocks all crawlers during the build phase. If that block is not removed before the live domain is pointed at the new site, the production site launches with Googlebot locked out. That error is invisible to site owners who do not check the file immediately after launch, and its effects take days or weeks to surface in Search Console as indexation numbers drop without explanation.

The AI search dimension has added a new reason for specialist review of robots.txt files that did not exist three years ago. A site that was correctly configured for traditional search in 2022 may be inadvertently blocking every major AI crawler in 2026 because the configuration was written before those crawlers existed. That gap requires a deliberate review rather than assuming the existing file is still appropriate for the current search landscape.

We Optimizz includes robots.txt review as part of every technical SEO engagement, covering both traditional crawler access and AI search crawler configuration. If your site is ranking in Google but absent from ChatGPT or Perplexity responses, robots.txt is one of the first checks in our AI visibility audit. The free SEO scan identifies the most visible technical issues as a starting point, and a free discovery call gives you a direct review of your current crawl configuration.

Do you need help with Robots.txt?

A misconfigured robots.txt can block Google and every AI search crawler simultaneously. We Optimizz reviews and optimizes robots.txt configuration across Wix Studio, WordPress, Framer, Webflow, and Shopify — for both traditional search and AI visibility. 894 websites delivered across 35+ countries.