logoalt Hacker News

ronsorlast Saturday at 8:45 PM1 replyview on HN

It's true that GPLv3 covers patents, but it is still primarily a copyright license.

The tokenizer's tokens aren't patented, for sure. They can't be trademarked (they don't identify a product or service). They aren't a trade secret (the data is public). They aren't copyrighted (not a creative work). And the GPL explicitly preserves fair use rights, so there are no contractual restrictions either.

A tokenizer is effectively a list of the top-n most common byte sequences. There's simply no basis in law for it to be subject to copyright or any other IP law in the average situation.


Replies

kachapopopowlast Saturday at 9:20 PM

I mean okay sure, there is no legal framework for tokenizers, but what about the rest of the model I think there is a much stronger argument there? And you could realistically extend the logic that if the model is GPL-2.0 licensed you have to provide all the tools to replicate it which would include the tokenizer.