logoalt Hacker News

diwank10/11/20241 replyview on HN

Grokking is so cool. What does it even mean that grokking exhibits similarities to criticality? As in, what are the philosophical ramifications of this?


Replies

hackinthebochs10/11/2024

Criticality is the boundary between order and chaos, which also happens to be the boundary at which information dynamics and computation can occur. Think of it like this: a highly ordered structure cannot carry much information because there are few degrees of freedom. The other extreme is too many degrees of freedom in a chaotic environment; any correlated state quickly gets destroyed by entropy. The point at which the two dynamics are balanced is where computation can occur. This point has enough dynamics that state can change in a controlled manner, and enough order so that state can reliably persist over time.

I would speculate that the connection between grokking and criticality is that grokking represents the point at which a network maximizes the utility of information in service to prediction. This maximum would be when dynamics and rigidity are finely tuned to the constraints of the problem the network is solving, when computation is being leveraged to maximum effect. Presumably this maximum leverage of computation is the point of ideal generalization.

show 2 replies