I don't know why people are ignoring this and posting hyperbolic statements like "it's all over for OpenAI and Google".
One of the cheaper Gemini models is actually only 8B and a perfect candidate for a release as a FOSS Gemma model but the Gemini 8B model contains hints of the tricks they used to achieve long context so as business strategy they haven't released it as Gemma FOSS model yet.
Several Chinese models already go up to 128k so it's not like they don't know how to scale it up, but models that handle long context well also take more time and compute to train, so it makes sense that they're iterating on quality of outputs rather than increasing length right now. I wouldn't read much into it wrt moats or lack thereof.
Here’s why. Because you can chain the prompts, CoT and answers. Let me explain.
Prompt 1 (64k) CoT (32k) Answer 1 (8k)
CoT 32k context is not included in the 64k input. So it’s actually 64k + 32k + 8k.
Prompt 2 (32k) + Previous CoT 1 (32k - this time it will be counted because we are chaining and these are two different API calls) Answer 2 (8k)
Another way to optimize this is to use another model to pick up only the correct CoT from the current answer and pass that as CoT for the next prompt. (If you are feeling adventurous enough, you could just use R1 to select the correct CoT but I think it will go insane trying to figure out the previous and current CoT)