logoalt Hacker News

Gigachadtoday at 6:55 AM2 repliesview on HN

At work the conversation is that simultaneously everyone is using LLMs now, yet we receive virtually no traffic through them. The LLMs scrape our data, provide an answer to the user, and we see nothing from it.


Replies

jrmgtoday at 7:00 AM

I have the same worry about LLMs in general - I know that ‘model collapse’ seems to be an unfashionable idea, but when the internet’s just full of garbage (soon?…), what are we going to train these things on?

Barbingtoday at 7:26 AM

How often are they scraping?

Also generally wondering… Do labs view scraping as legally safer than trying to cache the Internet? I figure it’s easy to mark certain content as all but evergreen (can do a quick secondary check for possible new news).

Maybe caching everything is too expensive?