It's not settled law as it pertains to LLMs, but, yes, creating a "statistical summary" of a book (consider, e.g., a concordance of Joyce's "Ulysses") is generally protected as fair use. However, illegally accessing pirated books to create that concordance is still illegal.