How does it differ from pirating music or movies?
There is no intellectual property to “pirate”.
Model outputs don't qualify for copyright. They aren’t patented. They aren’t trade secrets - the companies sell them. They aren’t trademarks, obviously. They are nothing, actually.
AI training is considered transformational. That's how AI training gets around copyright and it's probably consistent with copyright precedent. For example, indexing the web is considered transformational, even though you can recover the full text of everything in an inverted index.
Machine-extruded text is not copyrightable, since there was no human creativity involved in producing it.
(and if you argue the US models do produce copyrighted works, then oooops - whose copyright is it huh?)
Ow my head.
That when I pay for a model, the copyright of the output belongs to me. This is as work for hire as it gets.
According to US AI labs, training on other people's output is fair use. So that's how.