Just ran llama-bench at home with the similar priced AMD AI PRO R9700 32G. The phoronix numbers look...

canpan • yesterday at 10:23 PM • 4 replies • view on HN

Just ran llama-bench at home with the similar priced AMD AI PRO R9700 32G. The phoronix numbers look extremely low? Probably I misunderstand their test bench. Anyway, here are some numbers. Maybe someone with access to a B70 can post a comparison.

Tried to use the same model as the article:

llama-bench -m gpt-oss-20b-Q8_0.gguf -ngl 999 -p 2048 -n 128

AMD R9700 pp2048=3867 tg128=175

And a bigger model, because testing a tiny model with a 32GB card feels like a waste:

llama-bench -m Qwen3.6-27B-UD-Q6_K_XL.gguf -ngl 999 -p 2048 -n 128

AMD R9700 pp2048=917 tg128=22

Replies

Mindless2112 • yesterday at 11:02 PM

As of b8966, it is still not great.

  | model                 |      size |  params | backend | ngl |   test |            t/s |
  | --------------------- | --------: | ------: | ------- | --: | -----: | -------------: |
  | gpt-oss 20B MXFP4 MoE | 11.27 GiB | 20.91 B | SYCL    | 999 | pp2048 |  851.81 ± 6.50 |
  | gpt-oss 20B MXFP4 MoE | 11.27 GiB | 20.91 B | SYCL    | 999 |  tg128 |   42.05 ± 1.99 |
  | gpt-oss 20B MXFP4 MoE | 11.27 GiB | 20.91 B | Vulkan  | 999 | pp2048 | 2022.28 ± 4.82 |
  | gpt-oss 20B MXFP4 MoE | 11.27 GiB | 20.91 B | Vulkan  | 999 |  tg128 |  114.15 ± 0.23 |
  | qwen35 27B Q6_K       | 23.87 GiB | 26.90 B | SYCL    | 999 | pp2048 |  299.93 ± 0.40 |
  | qwen35 27B Q6_K       | 23.87 GiB | 26.90 B | SYCL    | 999 |  tg128 |   14.58 ± 0.06 |
  | qwen35 27B Q6_K       | 23.87 GiB | 26.90 B | Vulkan  | 999 | pp2048 |  581.99 ± 0.86 |
  | qwen35 27B Q6_K       | 23.87 GiB | 26.90 B | Vulkan  | 999 |  tg128 |   10.64 ± 0.12 |

Edit: I've no idea why one would use gpt-oss-20b at Q8, but the result is basically the same:

  | model                 |      size |  params | backend | ngl |   test |            t/s |
  | --------------------- | --------: | ------: | ------- | --: | -----: | -------------: |
  | gpt-oss 20B Q8_0      | 11.27 GiB | 20.91 B | SYCL    | 999 | pp2048 |  854.16 ± 6.06 |
  | gpt-oss 20B Q8_0      | 11.27 GiB | 20.91 B | SYCL    | 999 |  tg128 |   44.02 ± 0.05 |
  | gpt-oss 20B Q8_0      | 11.27 GiB | 20.91 B | Vulkan  | 999 | pp2048 | 2022.24 ± 6.97 |
  | gpt-oss 20B Q8_0      | 11.27 GiB | 20.91 B | Vulkan  | 999 |  tg128 |  114.02 ± 0.13 |

Hopefully, support for the B70 will continue to improve. In retrospect, I probably should have bought a R9700 instead...

magicalhippo • yesterday at 11:14 PM

For reference in case it's interesting to someone, a 5090 on Windows 11 with CUDA 13.1

  | model                 |       size |   params | backend  | ngl |   test |              t/s |
  | --------------------- | ---------: |--------: | -------- | --: |------: |----------------: |
  | gpt-oss 20B MXFP4 MoE |  11.27 GiB |  20.91 B | CUDA     | 999 | pp2048 | 10179.12 ± 52.86 |
  | gpt-oss 20B MXFP4 MoE |  11.27 GiB |  20.91 B | CUDA     | 999 |  tg128 |    326.82 ± 7.82 |
  | qwen35 27B Q6_K       |  23.87 GiB |  26.90 B | CUDA     | 999 | pp2048 |   3129.92 ± 5.12 |
  | qwen35 27B Q6_K       |  23.87 GiB |  26.90 B | CUDA     | 999 |  tg128 |     53.45 ± 0.15 |
  
  build: 9d34231bb (8929)

  gpt-oss-20b-MXFP4.gguf
  Qwen3.6-27B-UD-Q6_K_XL.gguf

Using MXFP4 of GPT-OSS because it was trained quantization-aware for this quantization type, and it's native to the 50xx.

andy_xor_andrew • yesterday at 10:33 PM

the build they use is from February, over two months old: https://github.com/ggml-org/llama.cpp/releases/tag/b8121

Which might not sound like much, but 2months in llm time is a long time, especially regarding support for new hardware like the r9700.

alt Hacker News

Replies