Human sensory data doesn't correspond -- not neatly, and probably not at all -- to LLM training data.
Human sensory data combines to give you a spatiotemporal sense, which is the overarching sense of being a bounded entity in time and space. From one's perceptions, one can then generalize and make predictions, etc. The stronger one's capacity for cognition, the more accurate and broader these generalizations and predictions become. Every invention, including or perhaps especially the invention of mathematics, is rooted in this.
LLMs have no apparent spatiotemporal sense, are not physically bounded, and don't know how to model the physical world. They're trained on static communications -- though, of course, they can model those, they can predict things like word sequences, and they can produce output that mirrors previously communicated ideas. There's something huge about the fact, staring us right in the face, that they're clearly not capable of producing anything genuinely new of any significance.
This is why AGI is probably in world models.