What kind of hardware/price does it take to run those?
For an 8-bit quant (what people call "near lossless") you are looking at something like 4xMI350X, which comes out to about $150k after adding the rest of the server. More if you go with Nvidia instead of AMD. More if you want more than maybe 8x concurrency
But prices are changing rapidly, and not for the better
Nvidia will sell you an entire server rack ready for inference. Or maybe you can roll out your own Blackwell based system.
We’re approaching a world where running a primer frontier model is possible on a workstation, probably will have something under $30k that looks like a desktop for Nvidia’s next generation. It sounds expensive, until you look at your Anthropic bill.
It’s similar unit economics as could computing for the open models. You can save a ton on the expenses by buying the hardware, but it requires a lot of in-house expertise, and you get the most value if you keep the system operating around the clock. The big kink is open models are usually 2 quarters behind frontier, and your competitors are probably trying to get access to mythos.