The hardware needs to catch up I think. I asked ChatGPT (lol) how much it would cost to build a Deepseek server that runs at a reasonable speed and it quoted ~400k-800k(8-16 H100 + the rest of the server).
Guess we are still in the 1970s era of AI computing. We need to hope for a few more step changes or some breakthrough on model size.
You can run most open models (excluding kimi-k2) on hardware that costs anywhere from 45 - 85k (tbf, specced before the vram wars of late 2025 so +10k maybe?). 4-8 PRO6000s + all the other bits and pieces gives you a machine that you can host locally and run very capable models, at several quants (glm4.7, minimax2.1, devstral, dsv3, gpt-oss-120b, qwens, etc.), with enough speed and parallel sessions for a small team (of agents or humans).
The problem is that Moore's law is dead, silicon isn't advancing as fast as what we've envisioned in the past, we're experiencing all sorts of quantum tunneling effects in order to cram as much microstructure as possible into silicon, and R&D for manufacturing these chips are climbing at a rapid rate. There's a limit to how we can fight against Physics, and unless we discover a totally new paradigm to alleviate this issues (ex. optical computing?) we're going to experience diminishing returns at the end of the sigmoid-like tech advancement cycle.