Anyone compare to ollama? I had good success with latest ollama with ROCm 7.4 on 9070 XT a few days ago
I'm also curious about this one, also I want to compare this to vLLM.
It is optimized for compatibility across different APIs as well as has specific hardware builds for AMD GPUs and NPUs. It’s run by AMD.
Under the hood they are both running llama.cpp, but this has specific builds for different GPUs. Not sure if the 9070 is one, I am running it on a 370 and 395 APU.
Seconded. Currently on ollama for local inference, but I am curious how it compares.
I just compared this on my Mac book M1 Max 64GB RAM with the following:
Model: qwen3.59b Prompt: "Hey, tell me a story about going to space"
Ollama completed in about 1:44 Lemonade completed in about 1:14
So it seems faster in this very limited test.