logoalt Hacker News

mtndew4brkfsttoday at 12:27 PM2 repliesview on HN

What is the specific concrete purpose of downloading millions of URLs per hour across different domains if it's "not doing anything wrong"?


Replies

decide1000today at 1:38 PM

Mostly ecommerce and pricing data. I work for marketplaces, brands, retail stores and even our own saas competitors. We match the EAN (gtin) to the correct SKU within seconds (Google Shopping, Amazon, etc). Part of it is our own trained ML models.

big-and-smalltoday at 12:57 PM

Might be it for scrapping content for training an LLM? Oh no only big tech allowed to do it...