This older HN thread shows R1 running on a ~$2k box using ~512 GB of system RAM, no GPU, at ~3.5-4.25 TPS: https://news.ycombinator.com/item?id=42897205
If you scale that setup and add a couple of used RTX 3090s with heavy memory offloading, you can technically run something in the K2 class.
Stop recommending 3090s they are all but obsolete now. Not having native bf16 is a showstopper.
Is 4 TPS actually useful for anything?
That's around 350,000 tokens in a day. I don't track my Claude/Codex usage, but Kilocode with the free Grok model does and I'm using between 3.3M and 50M tokens in a day (plus additional usage in Claude + Codex + Mistral Vibe + Amp Coder)
I'm trying to imagine a use case where I'd want this. Maybe running some small coding task overnight? But it just doesn't seem very useful.