All of it belongs to Anna's Archive. They may not have the rights to have it, but the data is there no less.
They're asking for support to cover archival and bandwidth.
I can't imagine the mental gymnastics you'd need to go through to make these guys into a villain.
Anna's Archived themselves scraped together all this data from other sources. See the notes of origin for example, often they are from zlib or libgen et ceteta.
It’s the exact same mental gymnastics that cause people to accuse model providers of large-scale plagiarism.
That is to say, not that much gymnastics. Like a cartwheel at most.
I don't really care about Anna's Archive, but let's not make them out to be some kind of Robin Hood story.
They have (illegally) scraped and re-hosted mountains of proprietary data and are now deliberately prompt-injecting unwitting LLM users in order to steal money from them too.
If you genuinely can't imagine how anyone would object to somebody taking other people's creative output and distributing it for free against their wishes then you probably need to work on your imagination a little bit.