You are saying: it turns out that one-shot playable Minecraft clones are actually pretty simple. Maybe it seemed a hard problem to programmers, but why not just say that the verifiable-rewards training has shown that their skill is unusually simple?
I am saying 3 years ago there wasn't a snowballs chance in hell I could one shot a playable Minecraft clone, and there has never been a snowballs chance in hell a human developer could do so in 45 minutes.
The difficulty of the task and human performance on that task hasn't changed. LLMs performance on that task has changed dramatically.
What a great example of the no true Scotsman fallacy.
Can't wait to see what unusually simple Erdos problem LLMs will expose next, hiding in plain sight for decades and seemingly intractable for professional mathematicians who weren't aware just how simple the problem was.