My suspicion (unconfirmed so take it with a grain of salt) is they either used some/all test da...

cadamsdotcom • last Saturday at 5:39 AM • 2 replies • view on HN

My suspicion (unconfirmed so take it with a grain of salt) is they either used some/all test data to train, or there was some leakage from the benchmark set into their training set.

That said Sonnet 4.5 isn’t new and there have been loads of innovations recently.

Exciting to see open models nipping at the heels of the big end of town. Let’s see what shakes out over the coming days.

Replies

pertymcpert • last Saturday at 5:54 AM

None of these open source models actually can compete with Sonnet when it comes to real life usage. They're all benchmaxxed so in reality they're not "nipping at the heels". Which is a shame.

➕ show 2 replies

satvikpendem • last Saturday at 7:31 AM

You are correct on the leakage, as other comments describe.

alt Hacker News

Replies