> To call training illegal is similar to calling reading a book and remembering it illegal.
Perhaps, but reproducing the book from this memory could very well be illegal.
And these models are all about production.
Models don’t reproduce books though. It’s impossible for a model to reproduce something word for word because the model never copied the book.
Most of the best fit curve runs along a path that doesn’t even touch an actual data point.
To be fair, that seems to be where some of the IA lawsuits are going. The argument goes that the models themselves aren't derivative works, but the output they produce can absolutely be - in much the same way that reproducing a book from memory could be copyright violation, trademark infringement, or generally go afoul of the various IP laws.