or maybe they don’t actually cache (fully) but lie and just don’t charge the user right now. at lea...

choppaface • today at 8:54 AM • 1 reply • view on HN

or maybe they don’t actually cache (fully) but lie and just don’t charge the user right now. at least half the users, who are probably also using the most similar tokens / prompts, wouldn’t really know the difference in latency (or care)

Replies

londons_explore • today at 9:41 AM

If it actually cost that much RAM, they would almost certainly add extra things to the API to manage cache lifetime. Ie. A 'please cache this for X minutes' flag, or a setting for a single re-use cache (the most common use case)

➕ show 1 reply

alt Hacker News

Replies