That's yet to be determined. I think a lot of open-weight models are benchmaxxed and their usef...

LUmBULtERA • today at 7:01 PM • 1 reply • view on HN

That's yet to be determined. I think a lot of open-weight models are benchmaxxed and their usefulness for many tasks are not represented by those.

enraged_camel • today at 7:55 PM

Yes, this has been my experience. They all struggle with long-horizon tasks and eventually start going in circles.

alt Hacker News