logoalt Hacker News

mogili1yesterday at 6:19 PM2 repliesview on HN

Rate limit essentially is a token limit


Replies

ibejoebyesterday at 7:28 PM

It depends on how it's implemented. If it's a fixed window, then your absolute ceiling is tokens/windows in a month. If it's a function of other usage, like a timeshare, you're still paying for some price for a month and you get what you get without paying more per token. There's an intrinsic limit based on how many tokens the model can process on that gpu in a month anyway, even if it's only you.

delusionalyesterday at 10:51 PM

Time x capacity is also a limit. There's always a limit.