But 400k was never usable in ChatGPT Plus/Pro subscriptions. It was nerfed down to 60-100k. If you submitted too long of a prompt they deleted the tokens on the end of your prompt before calling the model. Or if the chat got too long (still below 100k however) they deleted your first messages. This was 3 months ago.
Can someone with an active sub check whether we can submit a full 400k prompt (or at least 200k) and there is no prompt truncatation in the backend? I don't mean attaching a file which uses RAG.
> Or if the chat got too long (still below 100k however) they deleted your first messages. This was 3 months ago.
I can believe that, but it also seems really silly? If your max context window is X and the chat has approached that, instead of outright deleting the first messages outright, why not have your model summarise the first quarter of tokens and place those at the beginning of the log you feed as context? Since the chat history is (mostly) immutable, this only adds a minimal overhead: you can cache the summarisation, and don't have to do that over and over again for each new message. (If partially summarised log gets too long, you summarise again.)
Since I can come up with this technique in half a minute of thinking about the problem, and the OpenAI folks are presumably not stupid, I wonder what downside I'm missing.
API use was not merged in this way.
Context windows for web
Fast (GPT‑5.2 Instant) Free: 16K Plus / Business: 32K Pro / Enterprise: 128K
Thinking (GPT‑5.2 Thinking) All paid tiers: 196K
https://help.openai.com/en/articles/11909943-gpt-52-in-chatg...