logoalt Hacker News

voidUpdatetoday at 2:22 PM0 repliesview on HN

My preference is that if you need to use terabytes of data to train an LLM, that data should be used according to its copyright, and with the consent of the copyright holder, not just hoovered up from wherever you can find just a few bytes more data