Worried about AI scraping your content? Protect your site with a robots.txt file. By adding simple rules, you can block specific bots from accessing your pages. For instance, to prevent all AI crawlers.
Managing web crawlers is essential for controlling how your site is indexed. The robots.txt file allows you to permit or restrict specific bots from accessing certain parts of your website. This simple text file, located in your site’s root directory, serves as a set of instructions for web crawlers.
To allow or disallow a bot, specify its user agent. For example, to block all bots from a directory, use:
1 2 3 4 5 6 |
User-agent: * Disallow: /private/ |
To allow a specific bot, such as Googlebot, while blocking others:
1 2 3 4 5 6 7 8 9 |
User-agent: Googlebot Allow: / User-agent: * Disallow: / |
This helps safeguard your content while maintaining control over who can access your site. Here is full script of allow and disallow specific bots:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 |
# Allow Specific Bots with Global Restrictions Inherited User-agent: Googlebot User-agent: OAI-Searchbot User-agent: ChatGPT-User User-agent: Applebot User-agent: FacebookExternalHit User-agent: peer39_crawler user-agent: criteoBot Disallow: # Block All Other Bots from Entire Site User-agent: PerplexityBot User-agent: Amazonbot User-agent: ClaudeBot User-agent: Omgilibot User-agent: FacebookBot User-agent: anthropic-ai User-agent: Bytespider User-agent: Diffbot User-agent: ImagesiftBot User-agent: Omgili User-agent: YouBot User-agent: CCbot User-agent: PiplBot User-agent: SenutoBot User-agent: ShortPixel User-agent: Bytedance User-agent: Meta-ExternalAgent User-agent: petalbot User-agent: seznambot User-agent: Mechanize Disallow: / |
These rules help protect sensitive areas, optimize server resources, and improve SEO. Keep in mind, however, that robots.txt is a guideline, not a security measure. Some bots may ignore it, so combine it with other methods for stronger access control.