>Qwen3 32B is 6 times slower than GPT OSS 120B. Only if 120B fits entirely in the GPU. Otherwis...

kgeist • last Monday at 2:22 AM • 0 replies • view on HN

>Qwen3 32B is 6 times slower than GPT OSS 120B.

Only if 120B fits entirely in the GPU. Otherwise, for me, with a consumer GPU that only has 32 GB VRAM, gpt-oss 120B is actually 2 times slower than Qwen3 32B (37 tok/sec vs. 65 tok/sec)

alt Hacker News