I remember back in the day, when SEO was a more viable channel, being surprised at how much of the game was convincing Google to crawl you at all.
I naively assumed that they would be happy to take in any and all data, but they had a fairly sophisticated algorithm for deciding "we've seen enough, we know what the next page in the sequence is going to look like." They value their bandwidth.
It led to a lot of gaming of how you optimally split content across high-value pages for search terms (the 5 most relevant reviews should go on pages targeting the New York metro, the next 5 most relevant for LA, etc.)
I'm surprised again, honestly. I kind of assumed the AI race meant that Google would go back to hoovering all data at the cost of extra bandwidth, but my assumption clearly doesn't hold. I can't believe I knew all that about Google and still made the same assumption twice.
And from the comments below, sounds like they might be aggressively crawling still, but unidentified or with a different crawler identity. So perhaps they are hoovering up everything in the AI era.
Google may be aggressively crawling for AI and only making a small subset visible to the search database.