because it's not faster than the Ryzen 395's GPU. power efficiency doesn't matter as ...

vyr • yesterday at 5:05 PM • 0 replies • view on HN

because it's not faster than the Ryzen 395's GPU. power efficiency doesn't matter as much as TTFT for desktop users, especially when they're tasking their AMD box as a dedicated inference machine.

some older pre-395 AMD articles suggested it'd be possible to use the NPU for prefill and the GPU for decoding and this would be faster than using either alone, but we have yet to see that (even on Windows) for any usefully sized models, just toys like LLaMA-8B.

alt Hacker News