logoalt Hacker News

lee_arsyesterday at 11:18 PM1 replyview on HN

I'm currently working through research and testing for an article on Ars about the Spark and what things one might do with it, and I've kind of stumbled into a two-LLM agentic setup with Qwen3.6-35B-A3B (via nvidia/Qwen3.6-35B-A3B-NVFP4) as the planning agent and the FP8 version of Qwen3-Coder-30B-A3B-Instruct (Qwen/Qwen3-Coder-30B-A3B-Instruct-FP8) as the coding agent that the planner delegates tasks down to. I'm sticking with vLLM as the inference engine, and I've got it wired together into a 2-agent loop with Opencode.

The Qwen3.6-35B-A3B planner hums along at 50-55 tokens/s, and the Qwen3-Coder-30B-A3B-Instruct coder does 30-35. With both agents up and ready to work, RAM consumption sits at about 112 of 128GB.

It's pretty okay. I'm faffing around with having it disassemble old MS-DOS games from the 1980s, which is a task that lends itself well to the setup. It's not the fastest thing in the world, but with the planner's context window at 256k tokens and the coding agent at 128k, they chew through pretty long task lists handing things back and forth without complaint. The only real issue is that even with really tightly scoped prompts, the coding agent tends to hallucinate like it's on LSD. But the planning agent appears to be quite good at spotting the hallucinations and re-parceling work back to the coder.

It's neat. I'm going to be sad when I have to return the review unit in a couple of months.

edit - I also have been fiddling with Deepseek v4 Flash via Antirez's setup (https://github.com/antirez/ds4), and it's pretty fantastic (and fantastically easy to get running). It's pretty pokey on the Spark, though, at 14-ish tokens/sec. And unless you have a second Spark, it's going to be the only model you run at one time, as it eats alllll the rams.


Replies

mapputoday at 12:19 AM

Long time Ars reader, looking forward to your article (and have a few DOS games to reverse in mind already)!

Is this with a Ghidra MCP or some other technique? And why two models - did you try using Qwen3.6-35B-A3B for everything? (Or 27B or a bigger model since you have the RAM for it)