Does anyone know what kind of HW is required to run it locally? There are instructions but nothing about HW required.
Deepseek v3 required about 1tb of VRAM / RAM so 10 A100.
There are various ways to run it with lower vram if you're ok with way worse latency & throughput
Edit: sorry this is for v3, the distilled models can be ran on consumer-grade GPUs
You can try something like this to get a rough estimate: https://huggingface.co/spaces/NyxKrage/LLM-Model-VRAM-Calcul...
But you really don't know the exact numbers until you try, a lot of it is runtime/environment context specific.
Also wondering about this. My company is giving me an MBP M4 Max 128 GB in a couple of weeks. What can I run locally? I'm subbed to OpenAI but usually end up spending all the 50 weekly O1 prompts.
It's just a question of having enough VRAM+RAM to fit the model into memory.
the 7b distilled version works great on my laptop cpu and iGPU with vulkan. You can use llama.cpp (for iGPU with vulkan) or ollama (for cpu).
My laptop is a cheap laptop from 5 years ago. Not cutting edge hardware.
r1:14b outputs ~20 tokens/sec on my laptop with a 16gb 3080 card.
They released a bunch of different sized models and there are already quantized versions showing up on HF.
https://huggingface.co/unsloth/DeepSeek-R1-Distill-Llama-8B-... for example has versions that are 3GB, 4GB, 5GB, 8GB and 16GB.
That 3GB one might work on a CPU machine with 4GB of RAM.
To get good performance you'll want a GPU with that much free VRAM, or an Apple Silicon machine with that much RAM.