You can keep scaling down! I spent $2k on an old dual-socket xeon workstation with 768GB of RAM - I can run Deepseek-R1 at ~1-2 tokens/sec.
I did the same, then put in 14 3090's. It's a little bit power hungry but fairly impressive performance wise. The hardest parts are power distribution and riser cards but I found good solutions for both.
And if you get bored of that, you can flip the RAM for more than you spent on the whole system!
And heat the whole house in parallel
[dead]
Just keep going! 2TB of swap disk for 0.0000001 t/sec