logoalt Hacker News

SwellJoetoday at 9:53 AM2 repliesview on HN

You can theoretically self-host. DeepSeek is big. DS4 (the 2-bit quantization of DeepSeek Flash) runs on my Strix Halo with 128GB, but it's slow as hell. Completely unusable for interactive work. But, I guess a company that cared about data privacy and wanted a Good Enough local model could spend $100,000 or more on hardware to run it properly.


Replies

zozbot234today at 10:17 AM

The DS4 author has demoed upcoming work on Strix Halo that makes it roughly competitive with the Apple Silicon equivalent (i.e. Pro models with similar memory bandwidth figures, not Max or Ultra). Maybe even a bit faster for prefill, and with further potential for running small batches in parallel (since the GPU clearly has some amount of compute headroom during decode).

epolanskitoday at 10:14 AM

DS4 flash runs okay on MacBook Pro though:

https://github.com/antirez/ds4#speed