This is probably the most interesting (and insightful) paper on grokking I’ve read recently: https://arxiv.org/abs/2402.15555