logoalt Hacker News

kachapopopowyesterday at 6:00 PM2 repliesview on HN

I know for a fact that all SOTA models have linux source code in them, intentionally or not which means that they should follow the GPL license terms and open-source part of the models which have created derivative works out of it.

yes, this is indirectly hinting that during training the GPL tainted code touches every single floating point value in a model making it derivative work - even the tokenizer isn't immune to this.


Replies

ronsoryesterday at 6:01 PM

> the tokenizer isn't immune to this

A tokenizer's set of tokens isn't copyrightable in the first place, so it can't really be a derivative work of anything.

show 1 reply
chaos_emergenttoday at 12:31 AM

When you say “in” them, are you referring to their training data, or their model weights, or the infrastructure required to run them?