Given the amount of web scraping LLM providers have been doing, I'd say it's likely that any code that is publicly accessible on the internet has been incorporated into it's training data, whatever its license