Here is my (hot take) proposal for regulation:
1) *All major players open source their unobfuscated training data.*
a) The evidence so far shows that every major AI company engaged in intentional and historically unprecedented copyright violation to obtain their training data.
b) LLMs have now poisoined future data for any new players. This is a massive negative externality, and we shouldn't accept this externality as a moat locking out future players from competition.
2) *Levy a 20% royalty on all future genAI revenue to authors and artists who appear in the dataset and exempt genAI from future copywright violations.*
a) The current copyright model is bad for both authors and AI companies. It's hard for authors to collect from violations, and it's expensive and tedious for AI companies to comply with innumerable individual copyrights. Simplify the regime for everyone, and properly reward the people whose work is the foundation of these models.
b) The specifics can be worked out, but, among other things, the royalty should be proprotional to the token count of a work, not just number of works.