I've kind of given up on the routers for "free" inference, as you would expect, they ...

sjanes • today at 2:39 PM • 3 replies • view on HN

I've kind of given up on the routers for "free" inference, as you would expect, they tend to give you sub-par thinking because they are obviously trying to conserve as much inference as possible.

I've had some success turning my macbook M1 pro into a heating pad with Qwen 3.6 35B A3B MTP. Trying to use Gemini models "locally" resulted in a similar "short shrift" of effort resulting in mistakes and lots of turns. The reports of Fable being relentlessly "proactive" shows you can go the other direction as well, if you have strong enough branding and effective invoicing.

Replies

WalterGR • today at 4:09 PM

> The reports of Fable being relentlessly "proactive"

For the curious: https://news.ycombinator.com/item?id=48498573 - “Claude Fable is relentlessly proactive”.

mft_ • today at 4:28 PM

Tangent: did the MTP help you at all? I’ve tested that model back to back on my M1 Max MBP and the MTP version was actually marginally worse. I wonder if I didn’t use the right settings, although I tried several based on the obvious sources.

ignoramous • today at 6:27 PM

> I've kind of given up on the routers for "free" inference, as you would expect, they tend to give you sub-par thinking because they are obviously trying to conserve as much inference as possible.

Xiaomi MiMo ($6/mo: https://platform.xiaomimimo.com/token-plan) & Alibaba Qwen ($50/mo: https://www.alibabacloud.com/en/campaign/ai-scene-coding) have generous limits on fixed subscriptions.

➕ show 1 reply

alt Hacker News

Replies