Anthropic bought millions of books and scanned them, meaning that (at least for those sources) they were legally obtained. There has also been rampant piracy used to obtain similar material, which I won't defend. But it's not an absolute - training can be done on legally acquired material.