Have you tried GPT-OSS-120b MXFP4 with reasoning effort set to high? Out of all models I can run within 96GB, it seems to consistently give better results. What exact llama model (+ quant I suppose) is it that you've had better results against, and what did you compare it against, the 120b or 20b variant?
Have you tried GPT-OSS-120b MXFP4 with reasoning effort set to high? Out of all models I can run within 96GB, it seems to consistently give better results. What exact llama model (+ quant I suppose) is it that you've had better results against, and what did you compare it against, the 120b or 20b variant?