logoalt Hacker News

baqtoday at 1:40 PM1 replyview on HN

...I sense an animated svg of a pelican playing simcity benchmark is brewing somewhere


Replies

aedtoday at 2:36 PM

Funny you say that! When the two new models were released Friday I spun up mayors for each. (But didn’t do the prompting in the most scientific way.)

Mayor Compounded Wonder - Claude Opus 4.6

https://hallucinatingsplines.com/mayors/compounded-wonder-2c...

Mayor Bronze Offramp - OpenAI Codex 3.6

https://hallucinatingsplines.com/mayors/bronze-offramp-09941...

TL;DR: Opus won.

Have also thought about using openrouter and getting one mayor per model running the same prompt through all of them to create potentially the world's dumbest LLM benchmark.