What an amazing set of data!
The "Generative AI services popularity" [1] chart is surprising. ChatGPT is being #1 makes sense, but Character.AI being #2 is surprising, being ahead of Anthropic, Perplexity, and xAI. I suspect this data is strongly affected by the services DNS caching strategies.
The other interesting chart is "Workers AI model popularity" [2]. `llama-3-8b-instruct` has been leading at 30% to 40% since April. That makes it hands the most popular weights available small "large language model". I would have expected Meta's `m2m100-1.2b` to be more used, as well as Alphabet's `Gemma 3 270M` starting to appear. People are likely using the most powerful model that fits on a CF worker.
As shameless plug, for more popularity analysis, check out my "LLM Assistant Census" [3].
[1] https://radar.cloudflare.com/ai-insights#generative-ai-servi...
[2] https://radar.cloudflare.com/ai-insights?dateRange=24w#worke...
I recently wanted to find out which company crawls the deepest. The openAI bot was the most thorough one, it followed 405 links [1].
One way that Cloudflare is gatekeeping is by declaring which bots are AI Bots. Common Crawl's CCBot is used for a lot of stuff -- it's an archive, there are more than 10,000 research papers citing common crawl, mostly not AI -- but Cloudflare deems CCBot to be an "AI Bot", and I suspect most website owners don't have any idea what the list of AI Bots is and how they were chosen.
Cloudflare is positioning themselves to be the Internet's tax collector.
Very interesting data, particularly the AI rankings based on DNS requests. They appear to be off by one day because switching to a 4 week period, character AI is consistently #2 on weekends and Claude is #3 and they switch weekdays. But it’s shows the switch for Sunday and Monday. Probably a US time vs UTC issue.
This data is incredibly valuable for both AI companies and publishers. CF gets unprecedented visibility into who's crawling what, when, and how much. Wouldn't be surprised if this becomes a premium product - 'pay for priority bot verification' or 'detailed crawl analytics.
If I use Anthropic’s api for search, but then send user traffic directly to websites after showing the user the link, there’s no way for cloudflare to attribute that search to Anthropic.
That makes the ratios of crawl to referrals shown suspect.
I suppose these figures don't include the worst-behaving crawlers that hide their identity, e.g. by using residential proxies.
> Verified via WebBotAuth
I sincerely hope this initiative fails and no one bends over for CloudFlare on this.
If it’s been this way since February, how have AI crawlers not “caught up” yet?
The internet is big, but it isn’t that big. I’d expect to see a sudden dropoff as they start re-checking content that hasn’t changed, with some sort of exponential backoff.
Instead, my takeaway is that they are AI crawlers aren’t indexing to store in a way we’re used to with typical search engines, and unilaterally blocking these crawlers across the board would result in quite the “effect”.
Instead of just rankings of AI chatbots, I wish there was a volume because I feel the volume skews heavily to the top
My experience disagrees with the 'Respects robots.txt' column for most of the bots listed. Would love to see more details of how they determine that metric.
There's a nice write-up by Cloudflare from July covering some of those charts: https://blog.cloudflare.com/ai-search-crawl-refer-ratio-on-r...
The companies I avoid because they tried to charge my card even though I stopped using their service... Anthropic and OpenAI
So interesting they are orders of magnitude worse than the others with the crawl:user-request ratio... noted
I would have guessed that it's a minority, but less than 5% of web traffic being explicitly human-initiated is still a somewhat shocking statistic.
Naive question maybe: do these AI companies crawl/ingest video/audio yet ? if yes, is that included in the stats ?
My main learning is that character.ai is consistently in the top four, along with ChatGPT (always #1) and Claude. I didn't even know it was in the running.
According to that report, Grok has no respect whatsoever for anything
some related trends https://blog.cloudflare.com/crawlers-click-ai-bots-training/
How is it possible that training is much higher than search for use case?
would be interesting to see if linkedin (and the likes who don't want to be crawled) signs up for the pay-per-crawl that CF may come up with.
this feels like Cloudflare is no longer solely on the "serving the website owner's" side anymore
How is Googlebot not considered an AI bot? Googlebot feeds all the AI snippets and zero-click internet. Googlebot is an AI bot.
Perhaps this data could provide a useful example for Apple and OpenAI in their defence against Elon's laughable lawsuit. It's funny how xAI is almost at the bottom.
Nothing better than a nice and clean dashboard
These AI companies popping up like mushrooms remind me of the .com bubble in the early 2000s.
Does crawl-to-refer mean that for every 40k pages ClaudeBot crawls, only 1 outbound link is clicked from it?
[flagged]
> OpenAI
> Verified via WebBotAuth: In Progress
Feels like Cloudflare are positioning themselves as the gatekeepers of "good bots". The fact there is an "In Progress" state at all is telling: for everyone else, the answer is "No", but for OpenAI, the answer is "we're not doing it yet, but we've told CF that we plan to".