logoalt Hacker News

xdavidliulast Friday at 3:02 PM2 repliesview on HN

open source code is a miniscule fraction of the training data


Replies

TheCraiggerslast Friday at 4:21 PM

I'd love to see a citation there. We already know from a few years ago that they were training AI based on projects on GitHub. Meanwhile, I highly doubt software firms were lining up to have their proprietary code bases ingested by AI for training purposes. Even with NDAs, we would have heard something about it.

show 1 reply
maplethorpelast Friday at 3:10 PM

Where did most of the code in their training data come from?