logoalt Hacker News

qeternity01/20/20253 repliesview on HN

This says more about benchmarks than R1, which I do believe is absolutely an impressive model.

For instance, in coding tasks, Sonnet 3.5 has benchmarked below other models for some time now, but there is fairly prevalent view that Sonnet 3.5 is still the best coding model.


Replies

radu_floricica01/20/2025

Sonnet's strength was always comprehending the problem and its context. It happened to also be pretty good at generating code, but what it actually made it its first really useful model was that it understood _what_ to code and how to communicate.

show 1 reply
thegeomaster01/20/2025

LiveBench (which I like because it tries very hard to avoid contamination) ranks Sonnet 3.5 second only to o1 (which is totally expected).

show 2 replies
mordae01/21/2025

Because it listens actively and asks questions.