logoalt Hacker News

visargalast Saturday at 11:53 AM1 replyview on HN

You can train a LLM on completely clean data, creative commons and legally licensed text, and at inference time someone will just put a whole article or chapter in the model and has full access to regenerate it however they like.


Replies

saghmlast Saturday at 4:29 PM

Re-quoting the section the parent comment included from this agreement:

> > GPAI model providers need to establish reasonable copyright measures to mitigate the risk that a downstream system or application into which a model is integrated generates copyright-infringing outputs, including through avoiding overfitting of their GPAI model. Where a GPAI model is provided to another entity, providers are encouraged to make the conclusion or validity of the contractual provision of the model dependent upon a promise of that entity to take appropriate measures to avoid the repeated generation of output that is identical or recognisably similar to protected works.

It sounds to me like an LLM you describe would be covered if they people distributing it put in a clause in the license saying that people can't do that.