logoalt Hacker News

cranberryturkeylast Sunday at 8:09 PM3 repliesview on HN

qwen3 is slow though. i used it. it worked, but it was slow and lacking features.


Replies

kgeistlast Monday at 2:34 AM

On my RTX 5090 with llama.cpp:

gpt-oss 120B - 37 tok/sec (with CPU offloading, doesn't fit in the GPU entirely)

Qwen3 32B - 65 tok/sec

Qwen3 30B-A3B - 150 tok/sec

(all at 4-bit)

xfalcoxlast Monday at 12:17 AM

Qwen 3 is not slow by any metrics.

Which model, inference software and hardware are you running it on?

The 30BA3B variant flies on any GPU.

SchemaLoadlast Monday at 2:06 AM

GPT-OSS is slow too. Gemma3 gives me better results and runs faster.