In general terms, we get these kinds of results that seem to indicate that LLMs can’t really “create” new information using inference. LLM generated skills don’t help. Training on content that was generated by LLMs causes models to collapse or something. It seems like it is accepted as really intuitive.
But it seems pretty surprising to me. The training corpus contains so much information and the models operate at the level of… a bright novice. It seems like there obviously ought to be more insights to derive from looking harder at aspects of the corpus.
Why isn’t this considered astonishing?
The training corpus is only learned very approximately and poorly during pretraining. You can use inference-time compute to try and cope, but this can at best make you somewhat more self-consistent; it cannot recreate info that you didn't learn effectively to begin with!