logoalt Hacker News

meiselyesterday at 7:09 PM1 replyview on HN

I wonder if their "5.3" was continuously being updated, with regenerated benchmarks with each improvement, and they just stayed ready to release it when claude released


Replies

morleytjyesterday at 8:19 PM

This seems plausible. It would be shocking if these companies didn't have an automated testing suite which is recomputing these benchmarks on a regular basis, and uploading to a dashboard for supervision.

Given that they already pre-approved various language and marketing materials beforehand there's no real reason they couldn't just leave it lined up with a function call to go live once the key players make the call.