logoalt Hacker News

wongarsutoday at 1:33 PM0 repliesview on HN

Cutting off at the exact cent is difficult, but a hard limit that triggers within one dollar of the actual limit should really be possible

If for some resources you can't sample measurements fast enough you could weaken it to "triggers within one dollar or five minutes after cost overrun, whichever comes later". But LLM APIs are one of those cases where time isn't a factor, your only issue is that if you only check quota before each inference a given query might bring you over