logoalt Hacker News

dragonwriter01/21/20251 replyview on HN

On Windows or Linux you can run from RAM or split layers between RAM and VRAM; running fully on GPU is faster than either of those, but the limit on what you can run at all isn’t VRAM.


Replies

akhdanfadh01/22/2025

So is it possible to load the ollama deepseek-r1 70b (43gb) model on my 24gb vram + 32gb ram machine? Does this depend on how I load the model, i.e., with ollama instead of other alternatives? Afaik, ollama is basically llama.cpp wrapper.

I have tried to deploy one myself with openwebui+ollama but only for small LLM. Not sure about the bigger one, worried if that will crash my machine someway. Are there any docs? I am curious about this and how that works if any.