Integers in normal programming represent data or instructions; instructions are hand coded, have rigidly defined semantics, are not differentiable and have no redundancy.
> The 'magic' in weights is that the rules are spread through the whole model ... The grokking paper shows that this stops being the case with enough training data and enough compute.
I don't understand what you mean to say. That weights are not magic? That weights are not weights? NNs are made up of weights, which are learned and not coded. The fact that they do learn world models (grammar rules in your example), and that these models' weights tend to roughly concentrate by function and level of representation is perfectly logic but even more amazing. (Notice that much of the dismissive attitude towards LLMs depicts them as pure syntactic manipulators without the ability to develop world models- the exact opposite of what you point out).
>Integers in normal programming represent data or instructions; instructions are hand coded, have rigidly defined semantics, are not differentiable and have no redundancy.
I can, and have, written programs using an evolutionary algorithm that then run on bare metal. None of the things you list are true for those programs, yet other than being computationally more expensive to train they work just as well as neural networks.
>I don't understand what you mean to say
The diffusness of weights across the whole model isn't an innate feature of deep learning models. It is a feature of sparse training data and little compute.