Comparison to GPT-OSS-20B (irrespective of how you feel that model actually performs) doesn't f...

jcuenod • yesterday at 7:51 PM • 3 replies • view on HN

Comparison to GPT-OSS-20B (irrespective of how you feel that model actually performs) doesn't fill me with confidence. Given GLM 4.7 seems like it could be competitive with Sonnet 4/4.5, I would have hoped that their flash model would run circles around GPT-OSS-120B. I do wish they would provide an Aider result for comparison. Aider may be saturated among SotA models, but it's not at this size.

Replies

victorbjorklund • yesterday at 9:59 PM

The benchmarks lie. I've been using using glm 4.7 and it's pretty okay with simple tasks but it's nowhere even near Sonnet. Still useful and good value but it's not even close.

syntaxing • yesterday at 8:01 PM

Hoping a 30-A3B runs circles around a 117-A5.1B is a bit hopeful thinking, especially when you’re testing embedded knowledge. From the numbers, I think this model excels at agent calls compared to GPT-20B. The rest are about the same in terms of performance

unsupp0rted • yesterday at 8:02 PM

> Given GLM 4.7 seems like it could be competitive with Sonnet 4/4.5

Not for code. The quality is so low, it's roughly on par with Sonnet 3.5

alt Hacker News

Replies