Even if output is blocked, if it can be demonstrated that the copyrighted material is still in the model then you become liable for distribution and/or duplication without a license.
Training on synthetic data is interesting, but how do you generate the synthetic data? Is it turtles all the way down?