logoalt Hacker News

adrian_btoday at 10:08 AM1 replyview on HN

The CPU of Strix Halo has good BF16 acceleration, like any other Zen 4/Zen 5 CPU (the future Zen 6 will add FP16 acceleration).

I do not know about its GPU, which might have only FP16.

So it is likely that the right inference strategy would be to run any BF16 computations on the Strix Halo CPU, while running the quantized computations on its GPU.


Replies

tssgetoday at 11:13 AM

The GPU has INT4, INT8, BF16 and FP16. Notably no FP8 or FP4.The official GPTQ-Int4 release from Qwen is a great quant for this but custom kernels are still rare for this hardware.