I made my own benchmarks, very basic questions, and Claude 4.6 is actually worse than the free Stepf...

XCSme • yesterday at 11:06 PM • 1 reply • view on HN

I made my own benchmarks, very basic questions, and Claude 4.6 is actually worse than the free Stepfun 3.5 version: https://aibenchy.com

It is smart, but it fails at basic instruction following sometimes.

I remember this is a Claude thing for quite a while, where I kept trying to make it output just JSON (without structured output), and it always kept adding quotes or new lines.

Replies

XCSme • today at 1:21 AM

After looking more into it, Claude DOES give the correct answer, just not in the format that it's asked, it always adds more info at the end, even when asked to just give the answer...

➕ show 1 reply

alt Hacker News

Replies