Maybe pure language models aren't world models, but Genie 3 for example seems to be a pretty good world model:
https://deepmind.google/discover/blog/genie-3-a-new-frontier...
We also have multimodal AIs that can do both language and video. Genie 3 made multimodal with language might be pretty impressive.
Focusing only on what pure language models can do is a bit of a straw man at this point.