Here’s why. Because you can chain the prompts, CoT and answers. Let me explain.
Prompt 1 (64k) CoT (32k) Answer 1 (8k)
CoT 32k context is not included in the 64k input. So it’s actually 64k + 32k + 8k.
Prompt 2 (32k) + Previous CoT 1 (32k - this time it will be counted because we are chaining and these are two different API calls) Answer 2 (8k)
Another way to optimize this is to use another model to pick up only the correct CoT from the current answer and pass that as CoT for the next prompt. (If you are feeling adventurous enough, you could just use R1 to select the correct CoT but I think it will go insane trying to figure out the previous and current CoT)