Benchmarking 1 or a few samples isn't ever going to yield anything but noise. The actual benchm...

scosman • today at 3:41 PM • 1 reply • view on HN

Benchmarking 1 or a few samples isn't ever going to yield anything but noise. The actual benchmarks use thousands of tasks.

GPT 5.5 genuinely was back on top for a while there, but if you look at the past 2 years, being on Claude was better than being on OpenAI most of the time. If you're going to pick a tool and not switch constantly it was the right choice. Not to mention their tooling has always been ahead, and that gets ecosystem benefits.

Are they close and interchangeable today? Sure. But Sonnet was genuinely way better than anything OpenAI offered for a long time -- the valuation reflects that, not any given moment in time.

Replies

bluebands • today at 3:59 PM

okay what's a point in time where Claude was better? just give me a date

➕ show 1 reply

alt Hacker News

Replies