When I buy a book, I don't buy a license to read it, I don't sign an EULA that says I won't scan it, digitize it, or write a program to analyze the word frequencies it contains. Do you want buy a license to read a book, because this is how you get there.
The old rules were built on based on old capabilities and and old reality which no longer exists.
In Spain books include a copyright notice explicitly prohibiting reproduction and digitalization and alluding to article 270 of the Spanish criminal code.
Of course you don’t, because it’s not the EULA that enforces the copyright. Copyright law is what enforces the EULA. It’s right there in the fact it’s a Licensing Agreement.
When I buy a patented product I don't sign an EULA that says I can't manufacture and sell a copy, but I still can't manufacture and sell a copy.
The law has always been able to recognize a distinction between Hunter S. Thompson reading Ernest Hemingway and learning from his style and a billion GPUs reading a billion books to be able to produce it on demand. It takes time for the law to catch up to the technology but it will.
Perhaps it's that the transaction for you, an individual not explicitly profiting from the work, should be treated differently than a corporation using a work solely to profit from it.
The problem isn’t the reading. The problem is the output based on somebody’s other work.
There is a reason why we call it styles, because it’s a recognizable pattern someone came up with maybe after decades of work.
It is not an individual buying the book but a corporation, with the purpose of being able to create imitations of it, and all other books.
Copyright quite literally protects the act of copying or reproducing a work protected by copyright. And you are technically entering into something akin to an end user licensing agreement when you buy a book, the only difference being that the EULA is incorporated into law on an international basis through reciprocal copyright treaties.
So if scan a book you are making a copy. In some copyright jurisdictions this is allowed for individuals under a private copying exception - a copyright opt out, if you like - but the important thing is private use. In some jurisdictions there is also a fair use exception, which allows you to exploit the rights protected by copyright under certain circumstances, but fair use is quite specific and one big issue with fair use is that the rights you are exploiting cannot result in something that competes with the original work.
Other acts restricted by copyright include distribution, adaptation, performance, communication and rental.
So if you copy a book, digitize it, and write a program to analyze the word frequencies it contains you may, in some jurisdictions but not all, be allowed to do this.
If you’re doing it locally on your own machine you are simply copying it. If you do it in the cloud you are copying it and communicating the copy. If you copy it, analyze it and train an AI model on it that could be considered fair use in certain jurisdictions. Whether the outputs are adaptations of the training data is a matter of debate in the copyright community.
But importantly if you commercialise that model and the resulting outputs compete with the copyright protected material you used to train, your fair use argument may fail.
So when you buy a book you are actually party to what is effectively a licence granted by the copyright holder, albeit it to the publisher. But as the end user of the book you are still restricted in what you can do with that copyright protected work, through a universal end user licence encoded in law.
You don't sign an EULA saying you can't do those things because scanning then distributing is already prohibited by copyright. The way to start a license war is to keep the status quo of these companies being able to ingest and essentially reproduce human work for free. One of my big worries about AI is that it will accelerate companies locking everything down and hoarding their own data.