From my experience context window by itself tells half the story. You load a big document that’s 200k tokens and ask it a question, it will answer just fine. You start a conversation that soon enough balloons past 100k then it starts losing coherence pretty quickly. So I guess batch size plays a more significant role.