Yes, they introduced that Golang rewrite precisely to support the visual pipeline and other things t...

zozbot234 • today at 2:20 PM • 0 replies • view on HN

Yes, they introduced that Golang rewrite precisely to support the visual pipeline and other things that weren't in llama.cpp at the time. But then llama.cpp usually catches up and Ollama is just left stranded with something that's not fully competitive. Right now it seems to have messed up mmap support which stops it from properly streaming model weights from storage when doing inference on CPU with limited RAM, even as faster PCIe 5.0 SSDs are finally making this more practical.

The project is just a bit underwhelming overall, it would be way better if they just focused on polishing good UX and fine-tuning, starting from a reasonably up-to-date version of what llama.cpp provides already.

alt Hacker News