Cool but is there a reason they can't just make PRs for vLLM and llama.cpp? Or have their own f...

ilaksh • today at 1:02 PM • 1 reply • view on HN

Cool but is there a reason they can't just make PRs for vLLM and llama.cpp? Or have their own forks if they take too long to merge?

Replies

RealFloridaMan • today at 2:48 PM

They use the latest llama.cpp under the hood but built for specific AMD GPU hardware.

Lemonade is really just a management plane/proxy. It translates ollama/anthropic APIs to OpenAI format for llama.cpp. It runs different backends for sst/tts and image generation. Lets you manage it all in one place.

alt Hacker News

Replies