logoalt Hacker News

noosphrtoday at 4:53 AM1 replyview on HN

As opposed to integers in normal programming.

The 'magic' in weights is that the rules are spread through the whole model and you can't point to one place which encodes them.

The grokking paper shows that this stops being the case with enough training data and enough compute.


Replies

throw310822today at 7:35 AM

Integers in normal programming represent data or instructions; instructions are hand coded, have rigidly defined semantics, are not differentiable and have no redundancy.

> The 'magic' in weights is that the rules are spread through the whole model ... The grokking paper shows that this stops being the case with enough training data and enough compute.

I don't understand what you mean to say. That weights are not magic? That weights are not weights? NNs are made up of weights, which are learned and not coded. The fact that they do learn world models (grammar rules in your example), and that these models' weights tend to roughly concentrate by function and level of representation is perfectly logic but even more amazing. (Notice that much of the dismissive attitude towards LLMs depicts them as pure syntactic manipulators without the ability to develop world models- the exact opposite of what you point out).

show 1 reply