← Back to Insights

Cloudflare Is Blocking AI Bots From Your Website and You Don't Know It

June 2026

We launched a client website. Brand new domain. Clean Next.js codebase. Proper robots.ts file that said Allow: / for everything. We submitted it to Google Search Console, Bing Webmaster Tools, the whole checklist.

Two weeks later we pulled the live robots.txt. It looked like a hit list.

ClaudeBot. Blocked. GPTBot. Blocked. Google-Extended. Blocked. CCBot, Applebot-Extended, Bytespider, meta-externalagent, Amazonbot. All blocked. And a directive we never wrote: ai-train=no.

We didn't add any of that. Cloudflare did.

What Cloudflare AI Crawl Control Actually Does

When you add a domain to Cloudflare, a feature called AI Crawl Control turns on by default. It has its own managed robots.txt that gets injected at the edge, between your server and the internet.

Your app's robots.txt file can be perfect. Doesn't matter. Cloudflare overwrites it before any bot sees it.

The feature lives in your Cloudflare dashboard under a few paths:

  • /ai/overview — summary of AI bot activity
  • /ai/bots — toggle individual AI crawlers on or off
  • /ai/metrics — traffic data from AI bots
  • /ai/robots — the managed robots.txt that Cloudflare injects

The "Cloudflare managed" toggle on the robots page is the one that burned us. When it's on, Cloudflare appends its own block rules to whatever your origin server returns.

How We Caught It

The site was tentmakerswc.com, a window cleaning company in Oklahoma. We built it, deployed it, ran the full 7-layer launch checklist. But this was before Layer 7 existed.

We found the problem by fetching the live robots.txt directly:

curl https://tentmakerswc.com/robots.txt

The response had blocks we never wrote. Our Next.js robots.ts file had three lines. The live file had 30+.

Meanwhile, another client site we manage, trunorthpropertypartners.com, had a clean robots.txt with zero Cloudflare-managed content. The difference? We'd already turned off managed robots on that zone months earlier for a different reason.

That's the thing about silent defaults. You only find them after they bite you.

Why This Kills Your AI Visibility

Here's the part most people miss. Bing Webmaster Tools powers roughly 60% of AI grounding traffic. ChatGPT uses Bing. Copilot uses Bing. Perplexity pulls from Bing's index. When GPTBot and CCBot are blocked in your robots.txt, those systems can't crawl your pages. They can't cite you. They can't recommend you.

You won't see this in Google Analytics. There's no "AI bot blocked" report in Search Console. It's invisible unless you go look.

We also found a second problem on 4 client sites: Disallow: /_next/ in the robots.ts file. That blocked Googlebot from fetching the JavaScript chunks, CSS, and fonts it needs for render-based indexing. We caught it because the Cloudflare AI Crawl Control violations tab showed 5 Googlebot violations in a single 24-hour window on one site.

So Cloudflare's dashboard is useful for detection, even though it's the source of the other problem.

The Fix (5 Minutes)

  1. Log into your Cloudflare dashboard
  2. Select the domain
  3. Navigate to AI in the left sidebar, then Robots
  4. Find the Cloudflare managed toggle
  5. Turn it OFF
  6. Go to AI > Bots and verify individual bot toggles match what you want

Do this immediately after adding any new domain to Cloudflare. Before you submit to Google Search Console. Before you submit to Bing Webmaster Tools. Before you do anything else.

After toggling it off, fetch your live robots.txt again and confirm it matches what your app actually serves.

curl https://yourdomain.com/robots.txt

If the output matches your source code, you're good. If it doesn't, something else is injecting rules and you need to keep digging.

The Checklist We Use Now

We added a 7th layer to our site launch process after this. The full stack:

  1. DNS and SSL configuration
  2. Core Web Vitals baseline
  3. Structured data and meta tags
  4. Sitemap submission
  5. Google Search Console verification
  6. Bing Webmaster Tools verification
  7. Cloudflare AI Crawl Control: managed robots OFF

Layer 7 happens right after the Cloudflare zone is created, before any search engine submission. Not after. The order matters because search engines cache your robots.txt on first crawl. If they see blocks on day one, you're starting in a hole.

What About Security Concerns?

Some site owners want AI bots blocked. That's a valid choice. The problem is when it happens without you knowing.

If you've decided you don't want ChatGPT or Claude crawling your site, turn the managed robots ON and leave it. That's intentional. What we're flagging is the default behavior catching people who never made that decision.

For most businesses, especially local service companies, you want AI bots crawling your site. When someone asks ChatGPT "who does window cleaning in Oklahoma City," you want your site in that answer. Blocking crawlers guarantees you won't be.

FAQ

How do I know if Cloudflare is blocking AI bots on my site right now?

Run curl https://yourdomain.com/robots.txt from your terminal and compare the output to your actual robots.txt source file. If there are User-agent blocks you didn't write (ClaudeBot, GPTBot, CCBot, etc.), Cloudflare is injecting them. You can also check Cloudflare dashboard > AI > Robots to see if "Cloudflare managed" is toggled on.

Does turning off Cloudflare managed robots hurt my site's security?

No. AI Crawl Control manages which AI training crawlers can access your public pages. It has nothing to do with DDoS protection, WAF rules, or bot fight mode. Those are separate Cloudflare features and stay active regardless of your AI Crawl Control settings.

Will this fix make my site show up in ChatGPT answers immediately?

Not immediately. AI systems need to recrawl your site after the blocks are removed. Submit your site to Bing Webmaster Tools if you haven't already, since Bing powers a large share of AI grounding queries. Expect weeks, not days, before you see changes in AI-generated responses.

What if I'm not using Cloudflare?

Other CDN and hosting providers may have similar features. Vercel, Netlify, and some WordPress hosts have added AI bot controls in the last year. The principle is the same: check your live robots.txt against your source file. If they don't match, something between your server and the internet is modifying it.

We ran into this on a real client launch and burned two weeks figuring out why AI tools couldn't see a brand new site. If your robots.txt looks wrong and you can't figure out why, we're happy to look at it with you. You can also read more about how we approach SEO and generative engine optimization or our take on what AI actually means for your business.