I've been running the 'frontier' open-weight LLMs (mainly deepseek r1/v3) at hom...

mechagodzilla • yesterday at 11:54 PM • 1 reply • view on HN

I've been running the 'frontier' open-weight LLMs (mainly deepseek r1/v3) at home, and I find that they're best for asynchronous interactions. Give it a prompt and come back in 30-45 minutes to read the response. I've been running on a dual-socket 36-core Xeon with 768GB of RAM and it typically gets 1-2 tokens/sec. Great for research questions or coding prompts, not great for text auto-complete while programming.

Replies

tyre • today at 12:12 AM

Given the cost of the system, how long would it take to be less expensive than, for example, a $200/mo Claude Max subscription with Opus running?

➕ show 2 replies

alt Hacker News

Replies