logoalt Hacker News

verdvermtoday at 1:18 AM1 replyview on HN

on the Will It Mythos benchmark, small models are punching way above their weight(s)

gemma4-26B (#7)

qwen-3.6-27B (#9)

https://news.ycombinator.com/item?id=48640196


Replies

Gigachadtoday at 1:26 AM

I've tried running qwen 3.6 locally and it felt like LLMs a year ago where you can get them to do some stuff but the tasks have to be very small and you have to course correct them a lot to the point it's hard to say it's any faster than doing it all yourself.

Certainly the gap is closing but I feel it still makes more sense to pay pennies to run the full sized open models hosted on much better hardware.