Just ran llama-bench at home with the similar priced AMD AI PRO R9700 32G. The phoronix numbers look extremely low? Probably I misunderstand their test bench. Anyway, here are some numbers. Maybe someone with access to a B70 can post a comparison.
Tried to use the same model as the article:
llama-bench -m gpt-oss-20b-Q8_0.gguf -ngl 999 -p 2048 -n 128
AMD R9700 pp2048=3867 tg128=175
And a bigger model, because testing a tiny model with a 32GB card feels like a waste:
llama-bench -m Qwen3.6-27B-UD-Q6_K_XL.gguf -ngl 999 -p 2048 -n 128
AMD R9700 pp2048=917 tg128=22
For reference in case it's interesting to someone, a 5090 on Windows 11 with CUDA 13.1
| model | size | params | backend | ngl | test | t/s |
| --------------------- | ---------: |--------: | -------- | --: |------: |----------------: |
| gpt-oss 20B MXFP4 MoE | 11.27 GiB | 20.91 B | CUDA | 999 | pp2048 | 10179.12 ± 52.86 |
| gpt-oss 20B MXFP4 MoE | 11.27 GiB | 20.91 B | CUDA | 999 | tg128 | 326.82 ± 7.82 |
| qwen35 27B Q6_K | 23.87 GiB | 26.90 B | CUDA | 999 | pp2048 | 3129.92 ± 5.12 |
| qwen35 27B Q6_K | 23.87 GiB | 26.90 B | CUDA | 999 | tg128 | 53.45 ± 0.15 |
build: 9d34231bb (8929)
gpt-oss-20b-MXFP4.gguf
Qwen3.6-27B-UD-Q6_K_XL.gguf
Using MXFP4 of GPT-OSS because it was trained quantization-aware for this quantization type, and it's native to the 50xx.the build they use is from February, over two months old: https://github.com/ggml-org/llama.cpp/releases/tag/b8121
Which might not sound like much, but 2months in llm time is a long time, especially regarding support for new hardware like the r9700.
As of b8966, it is still not great.
Edit: I've no idea why one would use gpt-oss-20b at Q8, but the result is basically the same: Hopefully, support for the B70 will continue to improve. In retrospect, I probably should have bought a R9700 instead...