> It is unsurprising that an LLM performs better than random! That's the whole point. It does not imply emergence.
By definition, it is emergent behavior when it exhibits the ability to synthesize solutions to problems that it wasn't trained on. I.e. it can handle generalization.
Emergent behavior would imply that some other function was being reduced to token prediction. Behaving "better than random" ie: not just brute forcing would not qualify - token prediction is not brute forcing and we expect it to do better, it's trained to do so.
If you want to demonstrate an emergent behavior you're going to need to show that.