I think the logistics of calculating cost in real time is something that is extremely hard. I don't think there is one big cloud service provider that has hard limits instead of alerts.
As long as they revert the charge when notified of scenarios like this , and they have historically done so for many cases, it's fine. It's an acceptable workaround for a hard problem and the cost of doing business ( just like Credit Cards accept a certain amount of loss to fraud as part of business)
Why would it be hard to calculate cost? Multiply a fixed price * requests/time ? It doesn't have to be exact in real time, it just has to report something approximately useful in realtime.
It's absolutely not fine to be at the mercy of other people, that's what we buy cloud products or really any products for: So that we are not at the mercy of hardware faults, bad weather, bad teeth, hunger, thirst, [insert anything]
Cutting off at the exact cent is difficult, but a hard limit that triggers within one dollar of the actual limit should really be possible
If for some resources you can't sample measurements fast enough you could weaken it to "triggers within one dollar or five minutes after cost overrun, whichever comes later". But LLM APIs are one of those cases where time isn't a factor, your only issue is that if you only check quota before each inference a given query might bring you over
> I think the logistics of calculating cost in real time is something that is extremely hard.
What makes you think that?
Ridiculous. They are clearly not trying at all. A hard wall preventing going over budget by 100x in a couple hours is not some devilishly complicated decentralized system problem.
Don't tote the party line.
Same reason why Azure AI only has easy rate limits by minute, not by day or week or month. Open source proxy projects do it easily tho. Think about the incentives.
Going over a hard cap by 3% would be a reasonable failure to make, not by 30000%.
They don't have to compute it in real time. They can cut service when they detect it reached the cost and the difference is free of charge.
Overcharge protection doesn't have to be free. It could be +5% on prices or a fee of 25% when you reach the threshold.
They would have financial interest in calculating cost in real time and it'd magically become more and more precise over releases.