I think it's hard to take any LLM criticism seriously if they don't even specify which model they used. Saying "an LLM model" is totally useless for deriving any kind of conclusion.
Yes, I’d be curious about his experience with GPT-5 Thinking model. So far I haven’t seen any blunders from it.
When talking about the capabilities of a class of tools long term, it makes sense to be general. I think deriving conclusions at all is pretty difficult given how fast everything is moving, but there is some realities we do actually know about how LLMs work and we can talk about that.
Knowing that ChatGPT output good tokens last tuesday but Sonnet didn't does not help us know much about the future of the tools on general.