logoalt Hacker News

sschueller01/20/20257 repliesview on HN

Does anyone know what kind of HW is required to run it locally? There are instructions but nothing about HW required.


Replies

simonw01/20/2025

They released a bunch of different sized models and there are already quantized versions showing up on HF.

https://huggingface.co/unsloth/DeepSeek-R1-Distill-Llama-8B-... for example has versions that are 3GB, 4GB, 5GB, 8GB and 16GB.

That 3GB one might work on a CPU machine with 4GB of RAM.

To get good performance you'll want a GPU with that much free VRAM, or an Apple Silicon machine with that much RAM.

qqqult01/20/2025

Deepseek v3 required about 1tb of VRAM / RAM so 10 A100.

There are various ways to run it with lower vram if you're ok with way worse latency & throughput

Edit: sorry this is for v3, the distilled models can be ran on consumer-grade GPUs

diggan01/20/2025

You can try something like this to get a rough estimate: https://huggingface.co/spaces/NyxKrage/LLM-Model-VRAM-Calcul...

But you really don't know the exact numbers until you try, a lot of it is runtime/environment context specific.

steinvakt201/21/2025

Also wondering about this. My company is giving me an MBP M4 Max 128 GB in a couple of weeks. What can I run locally? I'm subbed to OpenAI but usually end up spending all the 50 weekly O1 prompts.

show 1 reply
heroiccocoa01/20/2025

It's just a question of having enough VRAM+RAM to fit the model into memory.

buyucu01/20/2025

the 7b distilled version works great on my laptop cpu and iGPU with vulkan. You can use llama.cpp (for iGPU with vulkan) or ollama (for cpu).

My laptop is a cheap laptop from 5 years ago. Not cutting edge hardware.

EVa5I7bHFq9mnYK01/21/2025

r1:14b outputs ~20 tokens/sec on my laptop with a 16gb 3080 card.