> Training is not redistribution. It's the exact same as you as a person learning to program from proprietary secret code, and then writing your own original code independently.
Well the difference is that copyright law applies to work fixed in a tangible medium of expression. This covers i.e. model weights on a hard drive but not the human brain. If the model is able to reproduce others’ work verbatim (like the example the article brings up of the song lyrics) then under copyright law that’s unauthorized reproduction. It doesn’t matter that the data is expressed via probabilistic weights because due to past lobbying/lawsuits by the software industry to get compiled binary code covered by copyright, reproduction can include copies that aren’t directly human readable.
> If the material is publicly accessible without protection, you have no reasonable expectation to exclusive control over its use.
There’s over 20 years of successful GPL infringement lawsuits over unlicensed use of publicly available GPL code that disagrees with this point.