logoalt Hacker News

jimberlageyesterday at 8:19 PM2 repliesview on HN

I remember back in the day, when SEO was a more viable channel, being surprised at how much of the game was convincing Google to crawl you at all.

I naively assumed that they would be happy to take in any and all data, but they had a fairly sophisticated algorithm for deciding "we've seen enough, we know what the next page in the sequence is going to look like." They value their bandwidth.

It led to a lot of gaming of how you optimally split content across high-value pages for search terms (the 5 most relevant reviews should go on pages targeting the New York metro, the next 5 most relevant for LA, etc.)

I'm surprised again, honestly. I kind of assumed the AI race meant that Google would go back to hoovering all data at the cost of extra bandwidth, but my assumption clearly doesn't hold. I can't believe I knew all that about Google and still made the same assumption twice.


Replies

HWR_14yesterday at 9:31 PM

Google may be aggressively crawling for AI and only making a small subset visible to the search database.

jimberlageyesterday at 8:23 PM

And from the comments below, sounds like they might be aggressively crawling still, but unidentified or with a different crawler identity. So perhaps they are hoovering up everything in the AI era.