> VNPT and Bunny Communications are home/mobile ISPs. i cannot ascertain for sure that their IPs are from domestic users, but it seems worrisome that these are among the top scraping sources once you remove the most obviously malicious actors.
This will be in part people on home connections tinkering with LLMs at home, blindly running some scraper instead of (or as well as) using the common pre-scraped data-sets and their own data. A chunk of it will be from people who have been compromised (perhaps by installing/updating a browser add-in or “free” VPN client that has become (or always was) nefarious) and their home connection is being farmed out by VPN providers selling “domestic IP” services that people running scrapers are buying.
Disagree on the method:
I recall that bot farms use pre-paid SIM cards for their data connections so that their traffic comes from a good residential ASN.
No client compromise required, it's a networking abuse that gives you good reputation of you use mobile data.
But yes, selling botnets made of compromised devices is also a thing.
I have trouble imagining any home LLM tinkerer who tries to run a naive scraper against the rest of the internet as part of their experiments.
Much more likely are those companies that pay people (or trick people) into running proxies on their home networks to help with giant scrapping projects what want to rotate through thousands of "real" IPs.