That's why the major AI labs are really careful about the code they include in the training runs.
The days of indiscriminately scraping every scrap of code on the internet and pumping it all in are long gone, from what I can tell.
Do you have pointers to this?
Would be a great resource to understand what works and what doesn't.
Well, if as the OP points out it is 'all garbage' they don't have a whole lot of choice to discriminate.