> How does anthropic know my "kernel" project isn't a personal toy and not the Linux kernel?
The Linux Kernel is in its training data. I just tested it. I copied about 20 random lines from the linux kernel and asked which codebase this was from and it could immediately tell.
The Linux kernel is also in the free bsd project. I'm allowed to copy as little or as much of the kernel as I like into my personal project thanks to the GPL.
Being able to attribute the source of a line of code doesn't help you to know if a repository can be legitimately hacked on.
As you could imagine, I might just take all or part of the Linux USB stack from the kernel to retrofit it into my own kernel.