I know for a fact that all SOTA models have linux source code in them, intentionally or not which me...

kachapopopow • yesterday at 6:00 PM • 2 replies • view on HN

I know for a fact that all SOTA models have linux source code in them, intentionally or not which means that they should follow the GPL license terms and open-source part of the models which have created derivative works out of it.

yes, this is indirectly hinting that during training the GPL tainted code touches every single floating point value in a model making it derivative work - even the tokenizer isn't immune to this.

Replies

ronsor • yesterday at 6:01 PM

> the tokenizer isn't immune to this

A tokenizer's set of tokens isn't copyrightable in the first place, so it can't really be a derivative work of anything.

➕ show 1 reply

chaos_emergent • today at 12:31 AM

When you say “in” them, are you referring to their training data, or their model weights, or the infrastructure required to run them?

alt Hacker News

Replies