logoalt Hacker News

empath75yesterday at 3:52 PM1 replyview on HN

They do memorize some books. You can test this trivially by asking ChatGPT to produce the first chapter of something in the public domain -- for example a Tale of Two Cities. It may not be word for word exact, but it'll be very close.

These academics were able to get multiple LLMs to produce large amounts of text from Harry Potter:

https://arxiv.org/abs/2601.02671


Replies

threethirtytwoyesterday at 3:56 PM

In that case I would say it is the act of reproducing the books that is illegal. Training the AI on said books is not.

So the illegality rests at the point of output and not at the point of input.

I’m just speaking in terms of the technical interpretation of what’s in place. My personal views on what it should be are another topic.

show 1 reply