logoalt Hacker News

mediamanyesterday at 8:05 PM0 repliesview on HN

On-prem versus cloud inference doesn't matter for concentration of power.

Concentration of power exists when the model makers are the same as (or control) the inference providers. Making a model is capital intensive, so there aren't many of them. Providing inference is not: I don't even need to own GPUs; I can rent them from those who do and then sell by the token. B300s cost less than $4 an hour currently.

Cloud can even be more effective at lowering concentration of power than on premise. Asking people to individually buy $20,000 of compute equipment plus power and cooling equipment to run a frontier model is not something they're going to do if they can just pay four-tenths of a cent per output token. If the only cloud inference providers are the big proprietary US titans, that means you're going to get far more power concentration than if open source inference providers are an alternative, because then I can just switch my API endpoint.