That's almost exactly my setup and I'm very happy with its performance. I noticed recent...

sieste • today at 3:56 PM • 2 replies • view on HN

That's almost exactly my setup and I'm very happy with its performance.

I noticed recently that I started to prefer my local Qwen3.6 35B A3B and pi agent over Claude Code.

Both fail at different tasks, and Qwen more so than Claude.

But the way Qwen fails is much more straightforward. In writing tasks Qwens hallucinations and bullshitting are much easier to spot because it doesn't have the sleek vocabulary and wordsmithing skills to disguise its ignorance.

In coding tasks that Qwen can't solve it often just goes into a tool calling doom loop that the pi harness can catch, whereas Claude attempts ever more convoluted and creative things just making more and more mess that takes forever to clean up.

I think part of the story is that the tasks for which I use AI are fairly simple and maybe don't need a frontier model. But I wonder if "proper" developers had similar experience?

Replies

eurekin • today at 4:51 PM

I keep finding more and more usecases for Q3.6 27b (same league) and the best performance is, when answers to my question is already in the context.

The moment I'm trying something open-ended or ambitious, Claude/ChatGPT clearly take you to the goal quicker.

For things, where there's a way to build a knowledgebase though, the local llm definitely can be a true contender. Plus, having a big context and no worries about filling it over and over - you can get quite far.

I'm writing this, literally in between cooking a pasta, that the local llm ordered products for me online. I've built a grocery shopping skill, so that it roughly knows what I have in fridge (losely), my last 10 representative orders (general preferences plus rich info about shops and skus around me) and actual real-time in stock info. The last part has been my personal pet peeve for every product that promised cooking ingredient delivery (that is not packaged specifically for that).

This is what has been promised to us by every big tech company with an agent, and now a local llms actually solved that for me fully.

porridgeraisin • today at 4:57 PM

This is true. The failure modes are simpler. And yes the ceiling is lower as well. Smaller models stability is lower over long sequences. And thus anything that needs a lot of CoT will be weaker. For example, I had a dumb lock + condvar with multiple defenses against lost wakeups in a N producer 1 consumer queue thing. Models generally need a lot of CoT before they realise they can switch it to a semaphore instead. Qwen typically isn't stable over such long CoTs and ends up adding more and more slop and band aids versus a larger model that outputs a large CoT and then realises it can swap 3 functions out with 2 lines if we use a semaphore.

alt Hacker News

Replies