Depends what you need the model to do. The recent granite4.1:3b just takes 2GB of memory and is fast. Results are pretty good and support tool calling. Barely a squeak out of the Mac laptop.
Even faster with the MLX builds.
Then when I need more heavy lifting I fire up a larger model.
IMHO the issue isn't the models. I've had OpenClaw give the same results as Claude using open models locally. Slower but does the job. Something that can do optimal model switching is what's needed.
[dead]