Git is part of the LLM's training set though, so simply asking it to recreate git in another language is pretty equivalent. Like, you can almost certainly get these LLMs to output gits full source code with some prompting, so there's not that much difference (as much as we like to pretend that AI generated code has no copyright implications)
> Like, you can almost certainly get these LLMs to output gits full source code with some prompting, so there's not that much difference (as much as we like to pretend that AI generated code has no copyright implications)
Are you sure? LLMs are in some way a compressed version of their input but it's a pretty lossy compression (arguably this makes them more like a compression algorithm than a compressed version of the data). I'm not sure you can prompt a full, accurate, copy of a nontrivial codebase out of them. Even with zero temperature their accuracy is just not that high.
That's something I have been wondering. If I as a human want to make a clean room reimplementation of some API or application, I must not have read the source code of the original implementation. I don't see why this shouldn't apply to LLMs as well. If an LLM might have been trained on the original source code, it should be considered "tainted".