I've done this project myself, based on Ubuntu 20.04 and a whole lot of patchsets [0]. I got up to the 2014-01-20 snapshot before running into weird LLVM stack issues that I couldn't figure out how to resolve. One big annoyance is that the snapshot file refers to some commit hashes that do not appear to point to any surviving public repo, so it takes a fair bit of effort to reconstruct which commits must have been included in the missing commits.
> the snapshot file refers to some commit hashes that do not appear to point to any surviving public repo
That sounds a bit worrying from a "reflections on trusting trust" perspective. Who's to say that those non-public commits didn't introduce a compiler backdoor? But I guess the more likely explanation is that somebody did some last-minute hotfixes that were later reworked before inclusion in the permanent record.