I own a forum which currently has 23k online users, all of them bots. The last new post in that forum is from _2019_. Its topic is also very niche. Why are so many bots there? This site should have basically been scraped a million times by now, yet those bots seem to fetch the stuff live, on the fly? I don’t get it.
Large scale scraping tech is not as sophisticated as you'd think. A significant chunk of it is "get as much as possible, categorize and clean up later". Man, I really want the real web of the 2000s back, when things felt "real" more or less... how can we even get there.
The bots are exposing themselves as Google, Bing and Yandex. I can’t verify whether it’s being attributed by IP address or whether the forum trusts their user agent. It could basically be anyone.
How do you define a user, and how do you define online?
If the forum considers unique cookies to be a user and creates a new cookie for any new cookie-less request, and if it considers a user to be online for 1 hour after their last request, then actually this may be one scraper making ~6 requests per second. That may be a pain in its own way, but it's far from 23k online bots.
Why pay for storage when you do it for them?
Why do you keep it operating? Is it the aquarium value?
Sure you do by now. You are the hard drive.
When you have trillions of dollars being poured into your company by the financial system, and when furthermore there are no repercussions for behaving however you please, you tend not to care about that sort of "waste".
I have a site with a complete and accurate sitemap.xml describing when its ~6k pages are last updated (on average, maybe weekly or monthly). What do the bots do? They scrape every page continuously 24/7, because of course they do. The amount of waste going into this AI craze is just obscene. It's not even good content.