Geerling benchmarked LLM performance on the Framework Desktop and the results look pretty lackluster...

beala • last Monday at 12:45 AM • 2 replies • view on HN

Geerling benchmarked LLM performance on the Framework Desktop and the results look pretty lackluster to me. First, the software seems really immature. He couldn't get ROCm or the NPU working. When he finally got the iGPU working with Vulkan, he could only generate 5 tok/sec with Llama 3.1 70b (40 GB model). That's intolerably slow for anything interactive like coding or chatting imo, but I suppose that's a matter of opinion.

https://github.com/geerlingguy/ollama-benchmark/issues/21

Replies

nrp • last Monday at 3:23 PM

Ryzen AI Max is best with ~100B MoE models rather than large monolithic ones. For example, OpenAI's gpt-oss-120b runs at around 40 tok/s and beats Llama 3.1 70B on most/all benchmarks.

Carstairs • last Monday at 12:06 PM

Prompt processing speeds are pretty poor too imo. I was interested in one to be able to run some of the 100b moe's but since they only give 50-150 tk/s (depending on model) it will take 5ish mins to process a 32k context which wouldn't be unbarebly slow for me. I've just looked at the results in that link and it's even worse for the 70b's. It will be nearly 20 mins to process a 32k context.

alt Hacker News

Replies