logoalt Hacker News

pabs3today at 1:17 PM1 replyview on HN

> all software ever written

LLMs aren't usually trained on large proprietary codebases like the ones from Google, Microsoft or Apple?


Replies

salawattoday at 2:45 PM

You think there wasn't a reason Microsoft bought GitHub, whose ToS allowed them to expand their training corpus vastly beyond their own internal systems? Why Amazon does the same thing with CodeCommit? If your stuff is hosted somewhere with a ToS, you can bet that repo is getting into the training corpus. Having you flavor of LLM in today's is too valuable for any corp to pass up the opportunity.