That's yet to be determined. I think a lot of open-weight models are benchmaxxed and their usefulness for many tasks are not represented by those.
Yes, this has been my experience. They all struggle with long-horizon tasks and eventually start going in circles.
Yes, this has been my experience. They all struggle with long-horizon tasks and eventually start going in circles.