I'm far far far far far far from a mathematician or even an amateur ML/AI practitioner, but to a layman the magic of LLMs (and now, multimodal models) seems clearly to be in the data. IOW the robot knows stuff, including how to "use" language at all, because it has read lots of stuff that humans have written down over the past several thousand years. IOW, as has often been given as advice to would-be big-thinkers, reading and writing is thinking. One can simplify (admittedly, possibly to the point of meaninglessness) to say that the language is really doing the thinking, humans were a meat-based substrate for it, and now we have a new kind of substrate for it in the form of datacenters the size of Connecticut filled with video cards.
So...given that dumb guy (or, more charitably to myself, humanities guy who happens to work in tech) understanding of these phenomena, my ears perk up when they say they've trained a model on random numbers, but still get it to do something semi-useful. Is this as big a deal as it seems? Have we now worked out a way to make the gigawatts' worth of video cards "smart" without human language?