Just tested it with my version of the pelican test: a minimal RTS game implementation (zero-shot in ...

senko • yesterday at 9:33 PM • 0 replies • view on HN

Just tested it with my version of the pelican test: a minimal RTS game implementation (zero-shot in codex cli): https://gist.github.com/senko/596a657b4c0bfd5c8d08f44e4e5347... (you'll have to download and open the file, sadly GitHub refuses to serve it with the correct content type)

This is on the edge of what the frontier models can do. For 5.4, the result is better than 5.3-Codex and Opus 4.6. (Edit: nowhere near the RPG game from their blog post, which was presumably much more specced out and used better engineering setup).

I also tested it with a non-trivial task I had to do on an existing legacy codebase, and it breezed through a task that Claude Code with Opus 4.6 was struggling with.

I don't know when Anthropic will fire back with their own update, but until then I'll spend a bit more time with Codex CLI and GPT 5.4.

alt Hacker News