logoalt Hacker News

choppafacetoday at 8:54 AM1 replyview on HN

or maybe they don’t actually cache (fully) but lie and just don’t charge the user right now. at least half the users, who are probably also using the most similar tokens / prompts, wouldn’t really know the difference in latency (or care)


Replies

londons_exploretoday at 9:41 AM

If it actually cost that much RAM, they would almost certainly add extra things to the API to manage cache lifetime. Ie. A 'please cache this for X minutes' flag, or a setting for a single re-use cache (the most common use case)

show 1 reply