logoalt Hacker News

2ndorderthoughttoday at 10:29 AM3 repliesview on HN

It's pretty close already. Check qwen3.6 27b if you haven't already. People are vibe and agentic coding with it on a single GPU.

It is more finicky than Claude but if you hand hold it a bit it's crazy.


Replies

gchamonlivetoday at 10:46 AM

I see that going around, and either the test cases are too simplistic or I'm doing something wrong. I have a server with a 3090 in it, enough to run qwen3.6, but I haven't had much luck using it with either codex or oh-my-pi. They work, but the model gets really slow with ~64k context and the attention degrades quickly. You'll sometimes execute a prompt, the model will load a test file and say something like "I was presented with a test file but no command. What should I do with it?".

So yeah, while it's true that qwen3.6 is good for agentic coding, it's not very good for exploring the codebase and coming up with plans. You need to pair it today with a model capable of ingesting the whole context and providing a detailed plan, and even then the implementation might take 10x the amount of time it'd take for sonnet or Gemini 3 to crunch through the plan.

EDIT:

My setup is really as simple as possible. I run ollama on a remote server on my local network. In my laptop I set OLLAMA_HOST and do `ollama pull qwen3.6:27b`, which then becomes available to the agent harnesses. I am not sure now how I set the context, but I think it was directly in oh-my-pi. So server config- and quantization-wise, it's the defaults.

show 7 replies
pizza234today at 12:11 PM

Vibe coding on consumer hardware is still very limited; this is especially true on GPUs, whose RAM limit is around 16 - maybe 24 - GB for the vast majority (although Macs change the equation).

These are two realworld experiments, whose results are disappointing for those expecting levels of performance comparable to cloud services:

- https://deploy.live/blog/running-local-llms-offline-on-a-ten...

- https://betweentheprompts.com/40000-feet/

The first is even the 35b version of qwen3.6.

show 1 reply
iugtmkbdfil834today at 12:01 PM

Eh. It is good in terms of results ( accuracy, good recommendations and so on ), but slow when it comes to actual inference. On local 128gb machine, it took over 5 minutes to brainstorm garage door opening mechanism with some additional restrictions for spice.

show 1 reply