logoalt Hacker News

tyfontoday at 6:17 AM3 repliesview on HN

I think the biggest advantage for me with ollama is the ability to "hotswap" models with different utility instead of restarting the server with different models combined with the simple "ollama pull model". In other words, it has been quite convenient.

Due to this post I had to search a bit and it seems that llama.cpp recently got router support[1], so I need to have a look at this.

My main use for this is a discord bot where I have different models for different features like replying to messages with images/video or pure text, and non reply generation of sentiment and image descriptions. These all perform best with different models and it has been very convenient for the server to just swap in and out models on request.

[1] https://huggingface.co/blog/ggml-org/model-management-in-lla...


Replies

majorchordtoday at 6:32 AM

> the ability to "hotswap" models with different utility instead of restarting the server

The article mentions llama-swap does this

hacker_homietoday at 7:05 AM

Llama.cpp added the ability load/switch models on demand with the max-models and models preset flags.

segmondytoday at 6:19 AM

You can do that with llama-server