$500k is a vast overestimation. For massive concurrency at FP8 or even BF16 maybe.
NVFP4 at reasonable speeds (~120 tok/s) and concurrency is possible at a $80/90k figure with today's prices, maybe even less. That buys you 6 RTX 6000 PRO Blackwells, a decent CPU and motherboard, power supply. 576gb of VRAM.
You could do it for under $50k if you're OK with 40 tok/s decode, ~1200 tok/s prefill.
How fast will the hardware become outdated? Are there big improvements expected in the next 3 years?
[dead]
Yes, a single GB300 workstation also does it, probably even more than 120tok/s.
Official price 85k...