A few things to note:
1. In the real world, for a similar task, there are little reasons for: A) not giving the compiler access to all the papers about optimizations, ISAs PDFs, MIT-licensed compilers of all the kinds. It will perform much better, and this is a proof that the "uncompressing GCC" is just a claim (but even more point 2).
2. Of all the tasks, the assembler is the part where memorization would help the most. Instead the LLM can't perform without the ISA documentation that it saw repreated infinite number of times during pre-training. Guess what?
3. Rust is a bad language for the test, as a first target, if you want an LLM-coded Rust C compiler, and you have LLM experience, you would go -> C compiler -> Rust port. Rust is hard when there are mutable data structures with tons of references around, and a C compiler is exactly that. To compose complexity from different layers is an LLM anti pattern that who worked a lot with automatic programming knows very well.
4. In the real world, you don't do a task like that without steering. And steering will do wonders. Not to say that the experiment was ill conceived. The fact is that the experimenter was trying to show a different point of what the Internet got (as usually).