>but simply because they operate at the level of words, not facts. They are language models.
Facts can be encoded as words. That's something we also do a lot for facts we learn, gather, and convey to other people. 99% of university is learning facts and theories and concept from reading and listening to words.
Also, even when directly observing the same fact, it can be interpreted by different people in different ways, whether this happens as raw "thought" or at the conscious verbal level. And that's before we even add value judgements to it.
>All they store are language statistics, boiling down to "with preceding context X, most statistically likely next words are A, B or C".
And how do we know we don't do something very similar with our facts - make a map of facts and concepts and weights between them for retrieving them and associating them? Even encoding in a similar way what we think of as our "analytic understanding".
Animal/human brains and LLMs have fundamentally different goals (or loss functions, if you prefer), even though both are based around prediction.
LLMs are trained to auto-regressively predict text continuations. They are not concerned with the external world and any objective experimentally verifiable facts - they are just self-predicting "this is what I'm going to say next", having learnt that from the training data (i.e. "what would the training data say next").
Humans/animals are embodied, living in the real world, whose design has been honed by a "loss function" favoring survival. Animals are "designed" to learn facts about the real world, and react to those facts in a way that helps them survive.
What humans/animals are predicting is not some auto-regressive "what will I do next", but rather what will HAPPEN next, based largely on outward-looking sensory inputs, but also internal inputs.
Animals are predicting something EXTERNAL (facts) vs LLMs predicting something INTERNAL (what will I say next).