On Windows or Linux you can run from RAM or split layers between RAM and VRAM; running fully on GPU is faster than either of those, but the limit on what you can run at all isn’t VRAM.
So is it possible to load the ollama deepseek-r1 70b (43gb) model on my 24gb vram + 32gb ram machine?
Does this depend on how I load the model, i.e., with ollama instead of other alternatives? Afaik, ollama is basically llama.cpp wrapper.
I have tried to deploy one myself with openwebui+ollama but only for small LLM. Not sure about the bigger one, worried if that will crash my machine someway. Are there any docs? I am curious about this and how that works if any.
So is it possible to load the ollama deepseek-r1 70b (43gb) model on my 24gb vram + 32gb ram machine? Does this depend on how I load the model, i.e., with ollama instead of other alternatives? Afaik, ollama is basically llama.cpp wrapper.
I have tried to deploy one myself with openwebui+ollama but only for small LLM. Not sure about the bigger one, worried if that will crash my machine someway. Are there any docs? I am curious about this and how that works if any.