These perverse incentives run at the heart of almost all Developer Software as a Service tooling. Using someone else's hosted model incentivizes increasing token usage, but it's nothing special about AI.
Consider Database-as-a-service companies: They're not incentivized to optimize on CPU usage, they charge per cpu. They're not incentivized to improve disk compression, they charge for disk-usage. There are several DB vendors who explicitly disable disk compression and happily charge for storage capacity.
When you run the software yourself, or the model yourself, the incentives aligned: use less power, use less memory, use less disk, etc.
My favourite example of this is the recent trend towards “wide events” replacing logs and metrics… spearheaded and popularised by companies that charge by the gigabytes ingested.
> When you run the software yourself, or the model yourself, the incentives aligned: use less power, use less memory, use less disk, etc.
But my team's time is soooo valuable. It's sooo sooo sooo valuable. Oh and we can't afford to hire anyone else either. But our time its sooo valuable. We need these tools!