While I'm not forgetting the spirit of what Git is, I'm also remembering how GitHub used "all open repositories" to train their first Copilot without telling anyone.
So, no thanks. I'll not be committing any personal code there anymore.
And no, I don't care for the social aspects either. Discoverability, stars, and AI bot powered issue bombardment.
I'm fine like this.
Also, remember, "Open Source is not about You".
What exactly did they train? Copilot is powered by claude, gemini, or ChatGPT these days.
Did they train autocomplete? I mean the code is open source so anyone can scrape it and train it too. I'm kind of glad they did train it because otherwise we'd still be stuck with Apple level AI models right now.
The whole reason we have so many models, including open weight models, that are all competitive with each other is because the data is free and anyone can be training off it. If the goal was to monetize the source code I guess the authors shouldn't make it open source.
It did so in direct violation of the licenses of the code held there as well and then sold code snippets they had no rights to and still do.
I mean, I never put my code on GitHub, but other people put it there, as they use GitHub: you can't not use GitHub. (Hell: even closed source projects, even ones that were never distributed even as a binary, if the code leaks, end up mirrored on GitHub.)
Agreed
Don't forget a achievement badges.
The training on "all open repositories" is the only training we heard about. I wouldn't be surprised if it wasn't beneath these greedy companies to train on other data, and respond "oops! we didn't that would happen" (by which they mean get found out).
Leaving is still the right move. But this applies to all centralized large services: Our use of Google and Google Drive, any Microsoft products, Adobe products, etc.
> (...) I'm also remembering how GitHub used "all open repositories" to train their first Copilot without telling anyone.
This is a silly opinion to hold, isn't it? I mean, you release projects under a license with the express purpose of freely distributing your code among anyone in the world that may have any interest whatsoever, and even allow they themselves to share it with anyone they feel fit. But you are somehow outraged if people actually use said code?
Please make it make sense.
I completely share your sentiment about feeling irked about open source code being used to train commercial AI models. However, I think the battle is already lost - the nature of copyright and open source code philosophy (currently) means that there isn't any real way of preventing your code being used to train AI. Look at the legal precedents being set in courts where many of the BigTechs have actually pirated copyrighted media to train their AI, and the court has said "that's acceptable". (Ofcourse, the actual act of piracy - like Facebook did by downloading copyrighted material through torrents - may not be legal, but the courts may be lenient here too as there seems to be an undercurrent of government approval to do anything to win the "AI Race").
And, even if you move your repository somewhere else, can you really prevent anyone from uploading it to Github? To do so, you may have to create your open source license.