If it actually cost that much RAM, they would almost certainly add extra things to the API to manage cache lifetime. Ie. A 'please cache this for X minutes' flag, or a setting for a single re-use cache (the most common use case)
https://platform.claude.com/docs/en/build-with-claude/prompt...
suggests the can cache outside the gpu.
https://platform.claude.com/docs/en/build-with-claude/prompt...
suggests the can cache outside the gpu.