logoalt Hacker News

londons_exploretoday at 9:41 AM1 replyview on HN

If it actually cost that much RAM, they would almost certainly add extra things to the API to manage cache lifetime. Ie. A 'please cache this for X minutes' flag, or a setting for a single re-use cache (the most common use case)


Replies

cyanydeeztoday at 10:57 AM

https://platform.claude.com/docs/en/build-with-claude/prompt...

suggests the can cache outside the gpu.