...I sense an animated svg of a pelican playing simcity benchmark is brewing somewhere
Funny you say that! When the two new models were released Friday I spun up mayors for each. (But didn’t do the prompting in the most scientific way.)
Mayor Compounded Wonder - Claude Opus 4.6
https://hallucinatingsplines.com/mayors/compounded-wonder-2c...
Mayor Bronze Offramp - OpenAI Codex 3.6
https://hallucinatingsplines.com/mayors/bronze-offramp-09941...
TL;DR: Opus won.
Have also thought about using openrouter and getting one mayor per model running the same prompt through all of them to create potentially the world's dumbest LLM benchmark.
Funny you say that! When the two new models were released Friday I spun up mayors for each. (But didn’t do the prompting in the most scientific way.)
Mayor Compounded Wonder - Claude Opus 4.6
https://hallucinatingsplines.com/mayors/compounded-wonder-2c...
Mayor Bronze Offramp - OpenAI Codex 3.6
https://hallucinatingsplines.com/mayors/bronze-offramp-09941...
TL;DR: Opus won.
Have also thought about using openrouter and getting one mayor per model running the same prompt through all of them to create potentially the world's dumbest LLM benchmark.