This is a nice benchmark IMO. I would be curious to see how competitors and improved models would co...

stephc_int13 • yesterday at 9:41 PM • 1 reply • view on HN

This is a nice benchmark IMO. I would be curious to see how competitors and improved models would compare.

Replies

NitpickLawyer • yesterday at 9:55 PM

And how long will it take before an open model recreates this. The "vibe" consensus before "thinking" models really took off was that open was ~6mo behind SotA. With the massive RL improvements, over the past 6 months I've thought the gap was actually increasing. This will be a nice little verifiable test going forward.

alt Hacker News

Replies