Curious if anyone else had the same reaction as me This model is specifically trained on this task...

jasonjmcghee • yesterday at 9:57 PM • 5 replies • view on HN

Curious if anyone else had the same reaction as me

This model is specifically trained on this task and significantly[1] underperforms opus.

Opus costs about 6x more.

Which seems... totally worth it based on the task at hand.

[1]: based on the total spread of tested models

Replies

Agreed. The idea is nice and honorable. At the same time, if AI has been proving one thing, it's that quality usually reigns over control and trust (except for some sensitive sectors and applications). Of course it's less capital-intense, so makes sense for a comparably little EU startup to focus on that niche. Likely won't spin the top line needle much, though, for the reasons stated.

➕ show 8 replies

DarkNova6 • yesterday at 10:10 PM

I'm never sure how much faith one can put into such benchmarks but in any case the optics seem to shift once you have pass@2 and pass@3.

Still, the more interesting comparison would be against something such as Codex.

speedgoose • today at 6:52 AM

But you can run this model for free on a common battery powered laptop sitting on your laps without cooking your legs.

➕ show 1 reply

nimchimpsky • today at 12:23 AM

the model is open source, you can run it locally. You don't think thats significant ?

alt Hacker News

Replies