logoalt Hacker News

codazodayesterday at 10:32 PM1 replyview on HN

You mean Qwen3-Coder-Next? I haven't tried that model itself, yet, because I assume it's too big for me. I have a modest 16GB MacBook Air so I'm restricted to really small stuff. I'm thinking about buying a machine with a GPU to run some of these.

Anywayz, maybe I should try some other models. The ones that haven't worked for tool calling, for me are:

Llama3.1

Llama3.2

Qwen2.5-coder

Qwen3-coder

All these in 7b, 8b, or sometimes 30b (painfully) models.

I should also note that I'm typically using Ollama. Maybe LM Studio or llama.cpp somehow improve on this?


Replies

vessenestoday at 6:58 AM

I’m mostly out of the local model game, but I can say confidently that Llama will be a waste of time for agentic workflows - it was trained before agentic fine tuning was a thing, as far as I know. It’s going to be tough for tool calling, probably regardless of format you send the request in. Also 8b models are tiny. You could significantly upgrade your inference quality and keep your privacy with say a machine at lambda labs, or some cheaper provider, though. Probably for $1/hr - where an hour is a many times more inference than an hour on your MBA.