I'd encourage you to look at the Software Heritage archive as an example of the broader diversity in software sources outside of GitHub. Even that doesn't cover everything, because many repos aren't yet archived there, and there are repo formats not yet supported, and code not in repos, and people that refuse/block archiving of their code.