This says more about benchmarks than R1, which I do believe is absolutely an impressive model. For...

qeternity • 01/20/2025 • 3 replies • view on HN

This says more about benchmarks than R1, which I do believe is absolutely an impressive model.

For instance, in coding tasks, Sonnet 3.5 has benchmarked below other models for some time now, but there is fairly prevalent view that Sonnet 3.5 is still the best coding model.

Replies

radu_floricica • 01/20/2025

Sonnet's strength was always comprehending the problem and its context. It happened to also be pretty good at generating code, but what it actually made it its first really useful model was that it understood _what_ to code and how to communicate.

➕ show 1 reply

thegeomaster • 01/20/2025

LiveBench (which I like because it tries very hard to avoid contamination) ranks Sonnet 3.5 second only to o1 (which is totally expected).

➕ show 2 replies

mordae • 01/21/2025

Because it listens actively and asks questions.

alt Hacker News

Replies