logoalt Hacker News

iugtmkbdfil834today at 12:35 PM1 replyview on HN

Seconded. Currently on ollama for local inference, but I am curious how it compares.


Replies

LumielGRtoday at 2:11 PM

Lemonade is using llama.cpp for text and vision with a nightly ROCm build. It can also load and serve multiple LLMs at the same time. It can also create images, or use whisper.cpp, or use TTS models, or use NPU (e.g Strix Halo amdxdna2), and more!