The idea of applying clean-room design to model training is interesting... having a "dirty model" and a "clean model", dirty model touches restricted content and clean model works only with the output of the dirty model.
However, besides how this sidesteps the fact that current copyright law violates the constitutional rights of US citizens, I imagine there is a very real threat of the clean model losing the fidelity of insight that the dirty model develops by having access to the base training data.
>this sidesteps the fact that current copyright law violates the constitutional rights of US citizens
I think most people sidestep this as it's the first I've heard of it! Which right do you think is being violated and how?