An Introduction to Google-Extended
Google-Extended lets you control whether your website content trains Google’s AI models while keeping your search rankings completely untouched. According to Cloudflare’s 2025 Year in Review, Google-Extended is among the most frequently disallowed user agents in robots.txt files, reflecting growing publisher concerns about AI training data usage.
The distinction matters because this crawler operates independently from Googlebot. You can block Google-Extended entirely and maintain full visibility in Google Search, Maps, and other search products. For website owners weighing their options, understanding how this crawler works provides a strategic advantage in the evolving relationship between publishers and AI platforms.
What Google-Extended Actually Does
The numbers tell a clear story. BuzzStream research found that 79% of top news websites in the US and UK now block at least one AI training crawler. The blocking rate increased 336% over the past year, according to Tollbit’s Q2 2025 report.
“Publishers are blocking AI bots using the robots.txt because there’s almost no value exchange,” explains Harry Clarkson-Bennett, SEO Director at The Telegraph. “LLMs are not designed to send referral traffic and publishers still need traffic to survive.”
The crawl-to-referral ratio explains the frustration. Cloudflare data shows Anthropic’s ratio was 73,000:1 in June 2025, meaning it crawled 73,000 pages for every single referral it sent back to publishers. Google Search’s ratio was 14:1. AI systems consume massive amounts of content while returning almost nothing to the original creators.
Specific blocking statistics from Cloudflare’s analysis of top 10,000 domains:
- GPTBot: Most frequently blocked AI crawler with 312 domains disallowing access
- CCBot: Second most blocked
- Google-Extended: Third most blocked, but only 5.6% of robots.txt files include it
- ClaudeBot: Now blocked by approximately 5.8 million websites, up from 3.2 million in July 2025
Why Publishers Are Blocking AI Crawlers
The numbers tell a clear story. BuzzStream research found that 79% of top news websites in the US and UK now block at least one AI training crawler. The blocking rate increased 336% over the past year, according to Tollbit’s Q2 2025 report.
“Publishers are blocking AI bots using the robots.txt because there’s almost no value exchange,” explains Harry Clarkson-Bennett, SEO Director at The Telegraph. “LLMs are not designed to send referral traffic and publishers still need traffic to survive.”
The crawl-to-referral ratio explains the frustration. Cloudflare data shows Anthropic’s ratio was 73,000:1 in June 2025, meaning it crawled 73,000 pages for every single referral it sent back to publishers. Google Search’s ratio was 14:1. AI systems consume massive amounts of content while returning almost nothing to the original creators.
Specific blocking statistics from Cloudflare’s analysis of top 10,000 domains:
- GPTBot: Most frequently blocked AI crawler with 312 domains disallowing access
- CCBot: Second most blocked
- Google-Extended: Third most blocked, but only 5.6% of robots.txt files include it
- ClaudeBot: Now blocked by approximately 5.8 million websites, up from 3.2 million in July 2025
How to Block Google-Extended (Step by Step)
Implementing the block takes less than five minutes. Add these lines to your robots.txt file in your website’s root directory:
User-agent: Google-Extended
Disallow: /This tells Google’s AI training crawler to stay away from your entire site. Your search visibility remains completely unaffected because Googlebot operates independently.
For partial access, where you want some content available for AI training but not others:
User-agent: Google-Extended
Allow: /blog/public/
Disallow: /blog/premium/
Disallow: /research/This configuration lets Google-Extended access your public blog posts while protecting premium content and research materials.
Verifying Your Implementation
After adding the directives, verify they’re working:
- Check your robots.txt is accessible at yourdomain.com/robots.txt
- Use Google’s robots.txt tester in Search Console
- Monitor server logs for Google-Extended user agent strings
Important note: Google-Extended is a control token, not a traditional crawler. According to technical guides, it won’t appear in your server logs the same way Googlebot does.
The Bigger Picture: AI Crawler Traffic in 2025
AI crawlers now represent a significant portion of web traffic. Cloudflare’s analysis found that AI bots (excluding Googlebot) accounted for 4.2% of all HTML requests across their network in 2025. Googlebot alone handled 4.5%, more than all other AI bots combined.
The growth has been explosive:
- GPTBot traffic increased 305% between May 2024 and May 2025
- Total crawler traffic rose 18% in the same period
- AI “user action” crawling increased more than 15 times in 2025
Cloudflare research found that only 14% of the top 10,000 domains have specific rules for AI crawlers in their robots.txt files. This leaves the vast majority of websites unprotected, with AI systems freely collecting their content for training.
“It is never too late to block,” notes Anthony Katsur, CEO of IAB Tech Lab. “The LLMs will come back and they will recrawl content in order for that information to stay fresh and relevant and accurate.”
Strategic Considerations for Your Website
The decision to block AI crawlers depends on your business model and content strategy. Consider these factors:
Block Google-Extended if you:
- Publish premium or subscription content
- Want to protect intellectual property
- Generate revenue primarily through direct site traffic
- Create original research or proprietary data
Allow Google-Extended if you:
- Want visibility in AI-powered search results
- Benefit from brand mentions in AI responses
- Prioritize reach over content protection
- Create educational or public-interest content
For local service businesses, the calculation differs from major publishers. Your Google Business Profile and local search rankings depend on Googlebot, not Google-Extended. Blocking the AI crawler won’t affect your map pack visibility or local organic rankings.
Beyond Robots.txt: Additional Protection Methods
Robots.txt relies on voluntary compliance. According to Tollbit data, 13.26% of AI bot requests ignored robots.txt directives in Q2 2025, up from 3.3% in Q4 2024.
For stronger enforcement:
Server-Level Blocking: Configure Apache .htaccess or Nginx to return 403 Forbidden responses to specific user agents.
Cloudflare AI Audit: Cloudflare’s tool helps publishers monitor crawler activity and enforce blocking rules automatically.
IP Allowlisting: Google publishes IP ranges for its crawlers in JSON files. Firewall rules can verify requests against these ranges while blocking spoofed user agents.
Rate Limiting: Set limits on requests per minute to prevent aggressive crawling from overwhelming your server resources.
Frequently Asked Questions
Does blocking Google-Extended hurt my SEO rankings?
No. Google’s technical documentation explicitly states that blocking Google-Extended has no impact on search rankings, indexation, or visibility. Only blocking Googlebot affects your search presence. These are separate crawlers with completely different functions.
Will blocking Google-Extended prevent my site from appearing in AI Overviews?
Blocking Google-Extended stops your content from training Gemini and Vertex AI models. However, AI Overviews in Google Search use Googlebot data. To avoid AI Overviews entirely, you would need to block Googlebot, which removes you from search results. Currently there’s no way to appear in search but not in AI Overviews.
How do I know if my robots.txt is working?
Check that your file is accessible at yourdomain.com/robots.txt and use Google Search Console’s robots.txt tester. Note that Google-Extended won’t appear in standard server logs like Googlebot does because it functions as a control token rather than a traditional crawler.
Should small business websites bother blocking AI crawlers?
It depends on your content. Most local service businesses have limited proprietary content worth protecting. Your competitive advantage comes from service quality, reviews, and local visibility, not from preventing AI training. However, if you publish substantial original content like guides or research, blocking may make sense.
What other AI crawlers should I consider blocking?
Major training crawlers include GPTBot (OpenAI), ClaudeBot (Anthropic), CCBot (Common Crawl), and Bytespider (ByteDance). Search-focused crawlers include PerplexityBot and OAI-SearchBot. Blocking training crawlers protects your content while search crawlers may provide visibility in AI search results.
Take Control of Your Content
Understanding Google-Extended gives you an informed choice about how your website participates in AI development. The crawler’s separation from search indexing means you can make this decision based on business strategy rather than SEO concerns.
Whether you block AI crawlers entirely, allow selective access, or embrace full participation, the choice should align with your content goals and revenue model. For local businesses focused on search visibility, this decision rarely impacts day-to-day operations. For publishers with substantial original content, it’s increasingly becoming a strategic priority.
Need help evaluating your website’s technical SEO configuration or understanding how crawlers interact with your content? Contact PushLeads for a consultation.
Who is Jeremy Ashburn?
Jeremy Ashburn has a unique blend of graphic design, web design, sales and marketing, business, and SEO experience. He’s the President and owner of Pushleads.com, a SEO Agency with the vision of “creating more traffic with less effort.” Jeremy’s clients have generated Millions of dollars by doing all forms of Digital Marketing.
After college graduation, he worked for a “fast and furious” advertising agency, Jeremy worked 8 years an Executive Recruiter, and became self-taught in web design, working with Google to do SEO, doing Google Ads, Facebook Ads, Retargeting, and Pay Per Click.
In the past Fifteen years, Jeremy’s created hundreds of websites, created blogs that make thousands, become a pro at ranking websites in Google, increased ROI for all of his clients, and helped his client grow dramatically.
View Success Stories
PushLeads has helped many other businesses grow with its SEO services; you’re next!
Don’t forget to review the testimonials, too. You’ll understand why PushLeads is the choice for SEO services for small businesses and mid-sized companies alike. An Asheville SEO specialist is only as good as his track record. See for yourself some of the amazing results PushLeads has accomplished through its combination of SEO services and outstanding customer service.
Video Testimonials
All About Plumbing
5 Star
Grove Manor Flooring
Video Case Study & Client Dashboard
HVAC Video Case Study
View the Client Dashboard
Watched the video above? If so, book a call below to meet with me…Email me at jeremy@pushleads.com if you can’t find a time.
Find Those Who Need You...Reach A Larger Audience
We can help you reach a larger audience that can benefit from your services.
SEO
Make it easy for people to find you in Google. Show up in Google Maps and Searches. Results driven SEO services.
Google Ads
Facebook usage is up right now. Drive customer acquisition and revenue through target Facebook ads.
Facebook Ads
Target potential customers in a geographical location who need your services now. Outsmart your competition.
Premium Website Design
Update your business website with a modern professional look. We do Web Design with the focus of your customer in mind.
Budget Website Design
Whether you’re in a need of a website to get your business online, we’ll bring your vision to life. Ask us about budget packages.
Marketing Workshops
Are you a Do-It-Yourselfer? Get hands on marketing training where we’ll do the marketing work together to make your business more visible.
Google’s latest documentation updates for the Google-Extended crawler bring practical changes that affect website owners and content managers. These modifications focus on AI training permissions and search visibility, with specific updates reflecting the transition from Bard to Gemini Apps. The changes give site owners direct control over how their content supports AI development.
A Little More About Google-Extended
Since its launch on September 28, 2023, Google-Extended has been offering web publishers a new way to manage their site’s visibility. It’s all about control. You can tell Google-Extended to take a hike or welcome it in, depending on whether you want your site’s content to train AI models. This choice is made possible through the Robots Exclusion Protocol, where you can set the rules of engagement for this particular crawler.
The technical implementation of Google-Extended builds upon established web crawling standards. Website administrators use specific commands in their robots.txt files to set permissions. These commands act as digital barriers or welcome signs, determining whether AI training systems can access and process the site’s content.
Though Google calls it a “standalone product token,” that term might sound like jargon. Simply put, it’s a way for you to specify how Google’s AI-focused crawlers interact with your site.
The initial announcement made things clear:
“We’re rolling out Google-Extended to give you more control. It’s your call if you want your site to contribute to making Bard and Vertex AI’s generative APIs smarter. This decision could influence how advanced these AI models get over time.”
Website owners who want to restrict access can implement a simple code block in their robots.txt file:
makefile
User-agent: Google-Extended
Disallow: /
Keeping Up with Changes
Google’s good at keeping a log of updates and changes, especially ones that matter to web publishers and marketing consultants. They’ve tweaked the Google-Extended documentation, especially after rebranding Bard to Gemini Apps. Now, Google-Extended’s crawling efforts are aimed at Gemini Apps and Vertex AI, but here’s the kicker: it won’t mess with your Google Search standing.
The separation between AI training and search indexing represents a significant technical distinction. Search crawlers continue their regular operation, maintaining the existing SEO structures and ranking factors. Meanwhile, Google-Extended operates as an independent system, collecting data specifically for AI model training.
So, What's the Big Update?
The main takeaway from the recent changes is that Google-Extended’s crawling is now focused on Gemini Apps, and it leaves Google Search rankings untouched.
In their own words:
“We’ve updated our terms to reflect Bard’s name change to Gemini Apps. Based on your feedback, we’ve made it clear: Google-Extended’s crawling is all about Gemini Apps and doesn’t affect Google Search.”
They’ve ditched the Bard name in favor of Gemini and added a reassuring note:
“Google-Extended won’t impact how your site ranks or appears in Google Search.”
Technical Implementation Details
The robots.txt configuration accepts several variations to give site owners precise control. Administrators can block specific directories while allowing others, creating a balanced approach to content sharing. For example:
User-agent: Google-Extended
Allow: /public/
Disallow: /private/
This granular control helps organizations maintain a strategic balance between contributing to AI advancement and protecting sensitive information.
Practical Applications and Benefits
Website owners now have three clear options for managing their content’s role in AI development:
1. Full access: Allow Google-Extended to crawl all content, supporting AI model training across their entire site.
2. Partial access: Use directory-specific rules to share selected content while protecting other areas.
3. Complete restriction: Block Google-Extended entirely, opting out of AI training contributions.
These choices let organizations align their content strategy with their business objectives and data-sharing preferences.
Future Implications
The separation between search rankings and AI training creates new opportunities for content strategy. Organizations can now participate in AI development without concerns about search visibility impact. This change opens doors for selective content sharing based on business goals rather than SEO considerations. As companies align their content efforts with their strategic objectives, they can leverage insights from AI training to enhance their messaging and audience engagement. This shift allows for innovation in developing top SEO strategies for 2024, focusing on quality and relevance rather than solely on search algorithms. Emphasizing value-driven content will not only boost brand loyalty but also improve overall performance in search rankings over time.
For those managing a website and looking to fine-tune their exposure to AI training without affecting search visibility, these updates provide clarity and control. PushLeads highlights these changes to ensure you’re in the know, especially if you’re navigating the complexities of online visibility and search engine optimization as a marketing consultant.
Read Next: Essential Principles of Keyword Research
CLICK HERE to schedule your FREE consultation TODAY!