> LLMs don’t “learn” from the information they operate on, contrary to what a lot of people assume.
Nothing is really preventing this though. AI companies have already proven they will ignore copyright and any other legal nuisance so they can train models.
How should they dinstinguish between real and fake data? It would be far to easy to pollute their models with nonesense.
I mean is it really ignoring copyright when copyright doesn't limit them in anyway on training?
> Nothing is really preventing this though
The enterprise user agreement is preventing this.
Suggesting that AI companies will uniquely ignore the law or contracts is conspiracy theory thinking.
They're already using synthetic data generated by LLMs to further train LLMs. Of course they will not hesitate to feed "anonymized" data generated by user interactions. Who's going to stop them? Or even prove that it's happening. These companies have already been allowed to violate copyright and privacy on a historic global scale.