logoalt Hacker News

hedgehog10/11/20242 repliesview on HN

Ollama (also wrapping llama.cpp) has GPU support, unless you're really in love with the idea of bundling weights into the inference executable probably a better choice for most people.


Replies

jart10/11/2024

Ollama is great if you're really in love with the idea of having your multi gigabyte models (likely the majority of your disk space) stored in obfuscated UUID filenames. Ollama also still hasn't addressed the license violations I reported to them back in March. https://github.com/ollama/ollama/issues/3185

show 3 replies
yjftsjthsd-h10/11/2024

When I said

> such great performance that I've mostly given up on GPU for LLMs

I mean I used to run ollama on GPU, but llamafile was approximately the same performance on just CPU so I switched. Now that might just be because my GPU is weak by current standards, but that is in fact the comparison I was making.

Edit: Though to be clear, ollama would easily be my second pick; it also has minimal dependencies and is super easy to run locally.