logoalt Hacker News

jcuenodyesterday at 7:51 PM3 repliesview on HN

Comparison to GPT-OSS-20B (irrespective of how you feel that model actually performs) doesn't fill me with confidence. Given GLM 4.7 seems like it could be competitive with Sonnet 4/4.5, I would have hoped that their flash model would run circles around GPT-OSS-120B. I do wish they would provide an Aider result for comparison. Aider may be saturated among SotA models, but it's not at this size.


Replies

victorbjorklundyesterday at 9:59 PM

The benchmarks lie. I've been using using glm 4.7 and it's pretty okay with simple tasks but it's nowhere even near Sonnet. Still useful and good value but it's not even close.

syntaxingyesterday at 8:01 PM

Hoping a 30-A3B runs circles around a 117-A5.1B is a bit hopeful thinking, especially when you’re testing embedded knowledge. From the numbers, I think this model excels at agent calls compared to GPT-20B. The rest are about the same in terms of performance

unsupp0rtedyesterday at 8:02 PM

> Given GLM 4.7 seems like it could be competitive with Sonnet 4/4.5

Not for code. The quality is so low, it's roughly on par with Sonnet 3.5