Have you tried GPT-OSS-120b MXFP4 with reasoning effort set to high? Out of all models I can run wit...

embedding-shape • today at 8:20 AM • 1 reply • view on HN

Have you tried GPT-OSS-120b MXFP4 with reasoning effort set to high? Out of all models I can run within 96GB, it seems to consistently give better results. What exact llama model (+ quant I suppose) is it that you've had better results against, and what did you compare it against, the 120b or 20b variant?

Replies

magic_hamster • today at 9:02 AM

How are you running this? I've had issues with Opencode formulating bad messages when the model runs on llama.cpp. Jinja threw a bunch of errors and GPT-OSS couldn't make tool calls. There's an issue for this on Opencode's repo but seems like it's been waiting or a couple of weeks.

> What exact llama model (+ quant I suppose) is it that you've had better results against

Not llama, but Qwen3-coder-next is on top of my list right now. Q8_K_XL. It's incredible (not just for coding).

➕ show 1 reply

alt Hacker News

Replies