logoalt Hacker News

flux3125today at 2:36 PM2 repliesview on HN

to be fair, llama.cpp has gotten much easier to use lately with llama-server -hf <model name>. That said, the need to compile it yourself is still a pretty big barrier for most people.


Replies

dTaltoday at 8:21 PM

You don't need to compile it yourself though? Unless you want CUDA support on Linux I guess, dunno why you'd need such a silly thing though:

https://github.com/ggml-org/llama.cpp/releases

ryandraketoday at 4:55 PM

I started with ollama and now I'm using llama.cpp/llama-server's Router Mode that allows you to manage multiple models through a single server instance.

One thing I haven't figured out: Subjectively, it feels like ollama's model loading was nearly instant, while I feel like I'm always waiting for llama.cpp to load models, but that doesn't make sense because it's ultimately the same software. Maybe I should try ollama again to convince myself that I'm not crazy and that ollama's model loading wasn't actually instant.