I don't get the text obsession beyond LLMs being immensely useful that you might as well use LLM for <insert tasks here>. I believe that some things live in text, some in variable size n-dimensional array, or in fixed set of parameters, and so on - I mean, our brains don't run on text alone.
But our brains do map high-dimensionality input to dimensions low enough to be describable with text.
You can represent a dog as a specific multi-dimensional array (raster image), but the word dog represents many kinds of images.