I’ve been benchmarking GGUF quants for Python tasks under some hardware configs.
- 4090 : 27b-q4_k_m
- A100: 27b-q6_k
- 3*A100: 122b-a10b-q6_k_L
Using the Qwen team's "thinking" presets, I found that non-agentic coding performance doesn't feel significant leap over unquantized GPT-OSS-120B. It shows some hallucination and repetition for mujoco codes with default presence penalty. 27b-q4_k_m with 4090 generates 30~35 tok/s in good quality.