> it gets caught in circles more often then say a 1t parameter model does. I've found that...

johndough • today at 4:07 PM • 1 reply • view on HN

> it gets caught in circles more often then say a 1t parameter model does.

I've found that the Q5+ quants are less loopy than Q4. Still not perfect, but noticeably better.

> reasonable enough tokens per second

The speed has been amazing. I've been running the recent llama.cpp MTP branch with an uncensored variant of Qwen3.6-35B-A3B on my RTX 3090 over 170 tokens per second and it was able to turn a buffer overflow into a reliable shell exploit in just a few seconds (with reasoning disabled). Still a bit loopy though. Hopefully, the Qwen team will pay more attention to those looping issues. It feels like their models are especially susceptible.

Replies

2ndorderthought • today at 5:23 PM

Is that on a single 3090? I need to change my settings it sounds like

alt Hacker News

Replies