I've been running the 'frontier' open-weight LLMs (mainly deepseek r1/v3) at home, and I find that they're best for asynchronous interactions. Give it a prompt and come back in 30-45 minutes to read the response. I've been running on a dual-socket 36-core Xeon with 768GB of RAM and it typically gets 1-2 tokens/sec. Great for research questions or coding prompts, not great for text auto-complete while programming.
Given the cost of the system, how long would it take to be less expensive than, for example, a $200/mo Claude Max subscription with Opus running?