The 397B model can be run at home with the weights stored on an SSD (or on 2 SSDs, for double throughput).
Probably too slow for chat, but usable as a coding assistant.
I think you have that backwards. Agentic coding is way more demanding than simple chat. The request/response loops (tool calling) are much tighter and more numerous, and the context is waaaaay bigger in general.
I think you have that backwards. Agentic coding is way more demanding than simple chat. The request/response loops (tool calling) are much tighter and more numerous, and the context is waaaaay bigger in general.