logoalt Hacker News

leptonsyesterday at 12:34 AM4 repliesview on HN

> LLMs don’t “learn” from the information they operate on, contrary to what a lot of people assume.

Nothing is really preventing this though. AI companies have already proven they will ignore copyright and any other legal nuisance so they can train models.


Replies

lioetersyesterday at 12:39 AM

They're already using synthetic data generated by LLMs to further train LLMs. Of course they will not hesitate to feed "anonymized" data generated by user interactions. Who's going to stop them? Or even prove that it's happening. These companies have already been allowed to violate copyright and privacy on a historic global scale.

Archelaosyesterday at 12:42 AM

How should they dinstinguish between real and fake data? It would be far to easy to pollute their models with nonesense.

show 1 reply
tick_tock_tickyesterday at 12:48 AM

I mean is it really ignoring copyright when copyright doesn't limit them in anyway on training?

show 1 reply
Aurornisyesterday at 1:52 AM

> Nothing is really preventing this though

The enterprise user agreement is preventing this.

Suggesting that AI companies will uniquely ignore the law or contracts is conspiracy theory thinking.

show 1 reply