logoalt Hacker News

GodelNumberingyesterday at 7:18 PM2 repliesview on HN

Makes it sound like a one trick pony


Replies

jascha_engyesterday at 7:34 PM

Anthropic is leaning into agentic coding and heavily so. It makes sense to use swe verified as their main benchmark. It is also the one benchmark Google did not get the top spot last week. Claude remains king that's all that matters here.

show 1 reply
grantpittyesterday at 7:27 PM

well, it's a big trick