Stop Using Ollama

258 points • by Zetaphor • today at 3:35 AM • 56 comments • view on HN

Comments

For most users that wanted to run LLM locally, ollama solved the UX problem.

One command, and you are running the models even with the rocm drivers without knowing.

If llama provides such UX, they failed terrible at communicating that. Starting with the name. Llama.cpp: that's a cpp library! Ollama is the wrapper. That's the mental model. I don't want to build my own program! I just want to have fun :-P

➕ show 2 replies

denismi • today at 7:42 AM

Hmm..

  pacman -Ss ollama | wc -l                                                                                                              
  16
  pacman -Ss llama.cpp | wc -l
  0
  pacman -Ss lmstudio | wc -l
  0

Maybe some day.

0xbadcafebee • today at 6:30 AM

No mention of the fact that Ollama is about 1000x easier to use. Llama.cpp is a great project, but it's also one of the least user friendly pieces of software I've used. I don't think anyone in the project cares about normal users.

I started with Ollama, and it was great. But I moved to llama.cpp to have more up-to-date fixes. I still use Ollama to pull and list my models because it's so easy. I then built my own set of scripts to populate a separate cache directory of hardlinks so llama-swap can load the gguf's into llama.cpp.

➕ show 3 replies

Zetaphor • today at 3:36 AM

I got tired of repeating the same points and having to dig up sources every time, so here's the timeline (as I know it) in one place with sources.

➕ show 4 replies

usernomdeguerre • today at 5:04 AM

Do they still not let you change the default model folder? You had to go through this whole song and dance to manually register a model via a pointless dockerfile wannabe that then seemed to copy the original model into their hash storage (again, unable to change where that storage lived).

At the time I dropped it for LMStudio, which to be fair was not fully open source either, but at least exposed the model folder and integrated with HF rather than a proprietary model garden for no good reason.

➕ show 2 replies

thot_experiment • today at 7:42 AM

I was pretty big on ollama, it seemed like a great default solution. I had alpha that it was a trash organization but I didn't listen because I just liked having a reliable inference backend that didn't require me to install torch. I switched to llama.cpp for everything maybe 6 months ago because of how fucking frustrating every one of my interactions with ollama (the organization) were. I wanna publicly apologize to everyone who's concerns I brushed off. Ollama is a vampire on the culture and their demise cannot come soon enough.

FWIW llama.cpp does almost everything ollama does better than ollama with the exception of model management, but like, be real, you can just ask it to write an API of your preferred shape and qwen will handle it without issue.

TomGarden • today at 6:40 AM

The performance issues are crazy. Thanks for sharing this

san_tekart • today at 7:28 AM

The CLI is great locally, but the architecture fights you in production. Putting a stateful daemon that manages its own blob storage inside a container is a classic anti-pattern. I ended up moving to a proper stateless binary like llama-server for k8s.

osmsucks • today at 6:43 AM

I noticed the performance issues too. I started using Jan recently and tried running the same model via llama.cpp vs local ollama, and the llama.cpp one was noticeably faster.

Havoc • today at 7:39 AM

Alas people want convenience and don’t care about this sort of stuff.

NamlchakKhandro • today at 7:40 AM

LM Studio is 1000x easier to use than ollama btw

utopiah • today at 6:44 AM

Not sure why VLC doesn't do that.

It's a joke... but also not really? I mean VLC is "just" an interface to play videos. Videos are content files one "interact" with, mostly play/pause and few other functions like seeking. Because there are different video formats VLC relies on codecs to decode the videos, so basically delegating the "hard" part to codecs.

Now... what's the difference here? A model is a codec, the interactions are sending text/image/etc to it, output is text/image/etc out. It's not even radically bigger in size as videos can be huge, like models.

I'm confused as why this isn't a solved problem, especially (and yes I'm being a big sarcastic here, can't help myself) in a time where "AI" supposedly made all smart wise developers who rely on it 10x or even 1000x more productive.

Weird.

➕ show 1 reply

mentalgear • today at 7:09 AM

> Ollama is a Y Combinator-backed (W21) startup, founded by engineers who previously built a Docker GUI that was acquired by Docker Inc. The playbook is familiar: wrap an existing open-source project in a user-friendly interface, build a user base, raise money, then figure out monetization.

    The progression follows the pattern cleanly:

    1. Launch on open source, build on llama.cpp, gain community trust
    2. Minimize attribution, make the product look self-sufficient to investors
    3. Create lock-in, proprietary model registry format, hashed filenames that don’t work with other tools
    4. Launch closed-source components, the GUI app
    5. Add cloud services, the monetization vector

fy20 • today at 6:37 AM

It feels like a bit of history is missing... If ollama was founded 3 years before llama.cpp was released, what engine did they use then? When did they transition?

➕ show 2 replies

NamlchakKhandro • today at 7:40 AM

drop ollama in the bin, no one needs it.

tyfon • today at 6:17 AM

I think the biggest advantage for me with ollama is the ability to "hotswap" models with different utility instead of restarting the server with different models combined with the simple "ollama pull model". In other words, it has been quite convenient.

Due to this post I had to search a bit and it seems that llama.cpp recently got router support[1], so I need to have a look at this.

My main use for this is a discord bot where I have different models for different features like replying to messages with images/video or pure text, and non reply generation of sentiment and image descriptions. These all perform best with different models and it has been very convenient for the server to just swap in and out models on request.

[1] https://huggingface.co/blog/ggml-org/model-management-in-lla...

➕ show 3 replies

speedgoose • today at 6:24 AM

I prefer Ollama over the suggested alternatives.

I will switch once we have good user experience on simple features.

A new model is released on HF or the Ollama registry? One `ollama pull` and it's available. It's underwhelming? `ollama rm`.

➕ show 2 replies

dhruv3006 • today at 7:24 AM

ollama is pretty intuitive to use still - dont see why will stop.

DeathArrow • today at 7:34 AM

I see no mention of vLLM in the article.

yokoprime • today at 6:21 AM

i had no idea about all this. especially the performance and bugs. thanks for informing me!

dnnddidiej • today at 6:17 AM

On a practical note if fumbles connection handling as to be unusable to download anything.

goodpoint • today at 7:23 AM

The missing attribution pattern is nasty.

arcza • today at 7:01 AM

I find the style of writing incredibly annoying (it doesn't make the point, full of hyperbole) and the website has the standard slopsite black background and glowing CSS.

dackdel • today at 6:19 AM

i use goose by block

➕ show 1 reply

paganel • today at 7:34 AM

Another scummy YCombinator project, one of many lately. Looks like no-one is left at the wheel, at least as long as the valuations (and hence money) keep coming in.

ipeev • today at 7:46 AM

[dead]

alt Hacker News

Stop Using Ollama

Comments