> that rely on tool use for facts, and “knowledge bases” tuned for retrieval-heavy work
I would say this isn't exclusive to the smaller OSS models. But rather a trait of Openai's models all together now.
This becomes especially apparent with the introduction of GPT-5 in ChatGPT. Their focus on routing your request to different modes and searching the web automatically (relying on an Agentic workflows in the background) is probably key to the overall quality of the output.
So far, it's quite easy to get their OSS models to follow instructions reliably. Qwen models has been pretty decent at this too for some time now.
I think if we give it another generation or two, we're at the point of having compotent enough models to start running more advanced agentic workflows. On modest hardware. We're almost there now, but not quite yet