Language is literally an abstraction of sensory inputs and cognitive processes. One can make similar arguments about image generation. These abstractions might characterize the higher cognitive abilities of humans, but it makes no sense to ignore "lower level" cognition. Embodiment is the foundation of our rich internal world models, in particular spacetime, causality, etc.
Current generative models merely mimic the output, with a fuzzy abstract linguistic mess in place of any physical/causal models. It's unsurprising that their capacity to "reason" is so brittle.
> Language is literally an abstraction of sensory inputs and cognitive processes.
Language can exist entirely independently from senses and cognition. It is an encoding of patterns in the world where the only thing that matters is if anybody or anything wielding it can map the encodings to and from the patterns they encode for (which is more of a sociological/synchronisation challenge).
Does C, or Java, 'make no sense' because it 'ignores lower level cognition'?
There are many parts of non-programming languages that similarly have nothing to do with embodiment. Some of them are even about incredibly abstract things impossible in our universe. One could argue that for many fields genius lies in being able to mentally model what is so foreign to the intuition our embodiment has imbued us with or to be able to find a mapping to facilitate that intuition. Said otherwise: the experience our embodiment has given us might limit how well we can understand the world (Quantum Mechanics anyone?).
Again, embodiment is interesting and worth pursuing, but far from a requirement for far-reaching intelligence.