What's the hardware cost to running it?
I was curious, and some [intrepid soul](https://wavespeed.ai/blog/posts/deepseek-v4-gpu-vram-require...) did an analysis. Assuming you do everything perfectly and take full advantage of the model's MoE sparsity, it would take:
- To run at full precision: "16–24 H100s", giving us ~$400-600k upfront, or $8-12/h from [us-east-1](https://intuitionlabs.ai/articles/h100-rental-prices-cloud-c...).
- To run with "heavy quantization" (16 bits -> 8): "8xH100", giving us $200K upfront and $4/h.
- To run truly "locally"--i.e. in a house instead of a data center--you'd need four 4090s, one of the most powerful consumer GPUs available. Even that would clock in around $15k for the cards alone and ~$0.22/h for the electricity (in the US).
Truly an insane industry. This is a good reminder of why datacenter capex from since 2023 has eclipsed the Manhattan Project, the Apollo program, and the US interstate system combined...
"if you have to ask..."
Probably like 100 USD/hour