In response to the recent use of crawlers by many artificial intelligence technology companies to crawl content from various websites and use it to train their large-scale natural language models, CloudflareAnnounceLaunch a simple and free tool that allows website operators to quickly prevent content from being crawled by crawlers and even affect the overall access performance of the website.
Cloudflare said that this tool will also be available to users of the free plan and can be updated over time by learning the data crawling patterns of different crawlers, allowing website operators to more easily and safely prevent crawlers from crawling their content and be used for artificial intelligence technology training.
According to Cloudflare statistics, since many crawler robots that crawl data can bypass traditional web page access conditions, many website operators have to adopt stricter filtering methods to block crawler robots. As a result, more normal web page access operations are affected, which in turn affects their overall traffic performance and even causes problems in online search engine rankings.
According to the statistics, ByteDance's crawler robot Bytespider accesses 40% of websites using Cloudflare services, while OpenAI's crawler robot GPTBot accounts for 30%. Other crawler robots with relatively significant access share include Amazonbot from Amazon and ClaudeBot from Claude AI, which account for about half of the total access volume.
However, even though relevant tools are provided to prevent crawlers from accessing website data in large quantities, Cloudflare said that there are still many artificial intelligence technology companies that bypass detection through circumvention methods, allowing their crawlers to still access website data in large quantities.
For example, it was previously reported that Perplexity AI bypassed website access rules and accessed website content without permission. If such access behavior is restricted by strict filtering, the actual access traffic of most websites may be affected. Therefore, Cloudflare expects to use further machine learning methods to identify whether access behavior is normal, or further prevent crawlers from maliciously accessing data.





