How much of a speedup might I get for, say, Qwen3.5-122B if I were to run with lemonade on my Strix ...

UncleOxidant • today at 6:19 PM • 1 reply • view on HN

How much of a speedup might I get for, say, Qwen3.5-122B if I were to run with lemonade on my Strix Halo vs running it using vulkan with llama.cpp ?

Replies

sawansri • today at 7:49 PM

You would get similar performance. Lemonade is designed as a turnkey (optimized for AMD Hardware) for local AI models. The software helps you manage backends (llama.cpp, flm, whispercpp, stable‑diffusion.cpp, etc) for different GenAI modalities from a single utility.

On the performance side, lemonade comes bundled with ROCm and Vulkan. These are sourced from https://github.com/lemonade-sdk/llamacpp-rocm and https://github.com/ggml-org/llama.cpp/releases respectively.

alt Hacker News

Replies