It feels like they're really focusing on overstating how confusing and weird it is that an LLM can write code but not play games very well, rather than just explaining it.
Code is text. LLMs are text input/output machines.
Game input/output is not at all text.
LLMs can certainly reason about games with a simple/explicit enough domain (try a risk tournament where models can talk to each other between turns!)
But LLMs are terrible at text adventures too. See e.g. https://entropicthoughts.com/updated-llm-benchmark and previous articles referenced in there.
I have yet to see any sort of harness that lets a frontier LLM interact with a text adventure and make meaningful progress on its own.
LLMs are used for OpenClaw and similar to do tasks for their user.
Games are a bunch of tasks too.
So if they fail at game tasks maybe it’s a bad idea to advertise those LLMs as task doing assistants.
The other reason is lack of continual learning, especially for long games like RPGs.