Allow and Disallow Specific Bots with Robots.txt

Worried about AI scraping your content? Protect your site with a robots.txt file. By adding simple rules, you can block specific bots from accessing your pages. For instance, to prevent all AI crawlers.

Managing web crawlers is essential for controlling how your site is indexed. The robots.txt file allows you to permit or restrict specific bots from accessing certain parts of your website. This simple text file, located in your site’s root directory, serves as a set of instructions for web crawlers.

To allow or disallow a bot, specify its user agent. For example, to block all bots from a directory, use:



User-agent: *
Disallow: /private/

User-agent: *

Disallow: /private/

To allow a specific bot, such as Googlebot, while blocking others:



User-agent: Googlebot
Allow: /

User-agent: *
Disallow: /

User-agent: Googlebot

Allow: /

User-agent: *

Disallow: /

This helps safeguard your content while maintaining control over who can access your site. Here is full script of allow and disallow specific bots:



# Allow Specific Bots with Global Restrictions Inherited
User-agent: Googlebot
User-agent: OAI-Searchbot
User-agent: ChatGPT-User
User-agent: Applebot
User-agent: FacebookExternalHit
User-agent: peer39_crawler
user-agent: criteoBot
Disallow:

# Block All Other Bots from Entire Site
User-agent: PerplexityBot
User-agent: Amazonbot
User-agent: ClaudeBot
User-agent: Omgilibot
User-agent: FacebookBot
User-agent: anthropic-ai
User-agent: Bytespider
User-agent: Diffbot
User-agent: ImagesiftBot
User-agent: Omgili
User-agent: YouBot
User-agent: CCbot
User-agent: PiplBot
User-agent: SenutoBot
User-agent: ShortPixel
User-agent: Bytedance
User-agent: Meta-ExternalAgent
User-agent: petalbot
User-agent: seznambot
User-agent: Mechanize
Disallow: /

# Allow Specific Bots with Global Restrictions Inherited

User-agent: Googlebot

User-agent: OAI-Searchbot

User-agent: ChatGPT-User

User-agent: Applebot

User-agent: FacebookExternalHit

User-agent: peer39_crawler

user-agent: criteoBot

Disallow:

# Block All Other Bots from Entire Site

User-agent: PerplexityBot

User-agent: Amazonbot

User-agent: ClaudeBot

User-agent: Omgilibot

User-agent: FacebookBot

User-agent: anthropic-ai

User-agent: Bytespider

User-agent: Diffbot

User-agent: ImagesiftBot

User-agent: Omgili

User-agent: YouBot

User-agent: CCbot

User-agent: PiplBot

User-agent: SenutoBot

User-agent: ShortPixel

User-agent: Bytedance

User-agent: Meta-ExternalAgent

User-agent: petalbot

User-agent: seznambot

User-agent: Mechanize

Disallow: /

These rules help protect sensitive areas, optimize server resources, and improve SEO. Keep in mind, however, that robots.txt is a guideline, not a security measure. Some bots may ignore it, so combine it with other methods for stronger access control.

Read Also

Leave a Reply Cancel reply