The elephant in that room is that all these LLM's were trained on boatloads of open source software that they can remix enough to not violate any copyrights.
As an open source contributor, in some ways this makes me much more frustrated than someone making a closed source fork of a BSD licensed project.
My take for a very long time has been that any model trained in violation of copyright should not itself be copyrightable. It should be public domain.
This would mean any model for which the trainer did not have permission to create a derivative work either implied by the work’s current license or obtained by them would have to release their model’s weights.
You could argue that it’s fair use, but a fair use quotation of a work does not become the property of the one quoting it. If I quote a line from a song or a novel I do not now own rights to that line. So there’s precedent for this.