Not every use case is a cloud provider or tech giant.
Newer Blackwell does 200+ tokens per second on the largest models and tens of thousands on the smaller models. Most military applications require fast smaller models, I'd imagine.
Also, custom chips are reportedly approaching an order of magnitude more for the price. It's a matter of availability right now, but that will be solved at some point.
Not every use case is a cloud provider or tech giant.
Newer Blackwell does 200+ tokens per second on the largest models and tens of thousands on the smaller models. Most military applications require fast smaller models, I'd imagine.
Also, custom chips are reportedly approaching an order of magnitude more for the price. It's a matter of availability right now, but that will be solved at some point.