AI is in danger of peeing in it's own water source. It's unbelievably useful at imitating ...

legitster • today at 7:01 AM • 4 replies • view on HN

AI is in danger of peeing in it's own water source. It's unbelievably useful at imitating and generating content, but it needs enough original content to be able to train and scrape.

Google got one thing wrong and nearly destroyed the internet - people need to have an incentive to contribute content online, and that incentive should not be to game the system for advertising.

This in particular dawned on me when asking Claude for instructions in taking apart my dryer. There was literally only one webpage on the internet left with instructions for my particular dryer - the page was more or less unusable with rotten links and riddled with adware. Claude did it's best but filled in the missing diagrams with hallucinations.

I was imaging if LLMs could finally solve the micropayments solution people have always proposed for the internet. Part of my monthly payment gets split between all of the sites that the LLM scraped knowledge. Paid out like Spotify pays out artists.

It might not be a lot of money, but it would certainly be more than the pitiful ad revenue you get from posting content online right now. And if I want to upload corrected instructions for repairing this dryer I would have reason to.

Replies

someone_eu • today at 10:13 AM

> I was imaging if LLMs could finally solve the micropayments solution people have always proposed for the internet. Part of my monthly payment gets split between all of the sites that the LLM scraped knowledge. Paid out like Spotify pays out artists.

This system is usually called taxes.

Which then pay for the universal healthcare, free education, affordable housing, libraries, parks,.. and so on.

LLM doesn't need to invent it, we should stop allowing them (people and companies behind LLM) to avoid it.

meander_water • today at 9:46 AM

I think most labs actively create synthetic data using existing model as part of the mix for the pretraining stage for their next model.

Would love to know exactly what the latest process is to keep slop out of training data.

➕ show 1 reply

ares623 • today at 7:22 AM

As a software user I wish I could do the same for all the software I use.

➕ show 1 reply

intended • today at 9:18 AM

> in danger

It has already done so, and we can be confident in saying that.

Verified content will always be relatively expensive when compared to AI content.

Visits to wikipedia and most sites have dropped. Rtings has gone full paywall. Ad revenue for producing Verified content will be too meager to allow for public consumption.

Theres jokes about GenAI being the great filter; while I doubt this, I do hope this is the final push that makes us think of how we want our information commons to be nurtured.

alt Hacker News

Replies