I’ve been benchmarking GGUF quants for Python tasks under some hardware configs. - 40...

b89kim • today at 8:51 AM • 0 replies • view on HN

I’ve been benchmarking GGUF quants for Python tasks under some hardware configs.

  - 4090 : 27b-q4_k_m
  - A100: 27b-q6_k
  - 3*A100: 122b-a10b-q6_k_L

Using the Qwen team's "thinking" presets, I found that non-agentic coding performance doesn't feel significant leap over unquantized GPT-OSS-120B. It shows some hallucination and repetition for mujoco codes with default presence penalty. 27b-q4_k_m with 4090 generates 30~35 tok/s in good quality.

alt Hacker News