I've benchmarked it on the Extended NYT Connections benchmark (

zone411 • yesterday at 7:46 PM • 4 replies • view on HN

I've benchmarked it on the Extended NYT Connections benchmark (https://github.com/lechmazur/nyt-connections/):

The high-reasoning version of GPT-5.2 improves on GPT-5.1: 69.9 → 77.9.

The medium-reasoning version also improves: 62.7 → 72.1.

The no-reasoning version also improves: 22.1 → 27.5.

Gemini 3 Pro and Grok 4.1 Fast Reasoning still score higher.

Donald • yesterday at 7:57 PM

Gemini 3 Pro Preview gets 96.8% on the same benchmark? That's impressive

➕ show 2 replies

Bombthecat • today at 7:20 AM

I would like to see a cost per percent or so row. I feel like grok would beat them all

tikotus • yesterday at 8:24 PM

Here's someone else testing models on a daily logic puzzle (Clues by Sam): https://www.nicksypteras.com/blog/cbs-benchmark.html GPT 5 Pro was the winner already before in that test.

➕ show 2 replies

scrollop • yesterday at 9:47 PM

Why no grok 4.1 reasoning?

➕ show 1 reply

alt Hacker News