logoalt Hacker News

vachinatoday at 4:09 AM1 replyview on HN

This is demonstrably false by the success of many scrapers from AI companies.


Replies

Nextgridtoday at 4:32 AM

LLMs aren't a good indicator of success here because an LLM trained on 80% of the data is just as good as one trained on 100%, assuming the type/category of data is distributed evenly. Proxies help when you do need to get access to 100% of the data including data behind social media loginwalls.