Presumably it had access to GCC (and LLVM/Clang) sources in it's training data? All of which are hosted or mirrored on Github.
And all of which are in an entirely different language, and which use pretty different architectures to this compiler.
And all of which are in an entirely different language, and which use pretty different architectures to this compiler.