logoalt Hacker News

weavietoday at 2:06 PM1 replyview on HN

How good are local LLMs at coding these days? Does anyone have any recommendations for how to get this setup? What would the minimum spend be for usable hardware?

I am getting bored of having to plan my weekends around quota limit reset times...


Replies

throwaway2027today at 2:47 PM

Some claim that some of the recent smaller local models are as good as Sonnet 4.5 of last year and the bigger high-end models can be as almost as good as Claude, Gemini and Codex today, but some say they're benchmaxed and not representative.

To try things out you can use llama.cpp with Vulkan or even CPU and a small model like Gemma 4 26B-A4B or Gemma 4 31B or Qwen 3.5 35-A3B or Qwen3.5 27B. Some of the smaller quants fit within 16GB of GPU memory. The default people usually go with now is Q4_K_XL, a 4-bit quant for decent performance and size.

https://huggingface.co/unsloth/gemma-4-26B-A4B-it-GGUF

https://huggingface.co/unsloth/gemma-4-31B-it-GGUF

https://huggingface.co/unsloth/Qwen3.5-35B-A3B-GGUF

https://huggingface.co/unsloth/Qwen3.5-27B-GGUF

Get a second hand 3090/4090 or buy a new Intel Arc Pro B70. Use MoE models and offload to RAM for best bang for your buck. For speed try to find a model that fits entirely within VRAM. If you want to use multiple GPUs you might want to switch to vLLM or something else.

You can try any of the following models:

High-end: GLM 5.1, MiniMax 2.7

Medium: Gemma 4, Qwen 3.5

https://unsloth.ai/docs/models/minimax-m27

https://unsloth.ai/docs/models/glm-5.1

https://unsloth.ai/docs/models/gemma-4

https://unsloth.ai/docs/models/qwen3.5

https://github.com/ggml-org/llama.cpp

show 1 reply