Paper went over my head but is this in any way related to my experience of Claude Opus 4.8 using increasingly terse language with very short, overloaded words? Lately I've been having trouble parsing the things it writes about my own code, it's using the kind of compressed language that you see typically in git commit message subject lines but relentless, always on.
No, this is in the same ballpark as ideas like big-O notation. The paper is saying that transformers can recognize a language with exponentially fewer symbols than other kinds of systems, i.e. they're more succinct.
It's exactly as related to real models as computer science is to real computers.