that 1M context thing, I wonder if it's just some abstraction thing where it compresses/sums up parts of the context so it fits into a smaller context window?
You don’t normally compress the system prompts, though I guess maybe it treats its own summary with more authority. This article [0] talks about the problem very well.
Though I feel it’s most likely because models tend to degrade on large context (which can be seen experimentally). My guess is that they aren’t RLed on large context as much, but that’s just a guess.
You don’t normally compress the system prompts, though I guess maybe it treats its own summary with more authority. This article [0] talks about the problem very well.
Though I feel it’s most likely because models tend to degrade on large context (which can be seen experimentally). My guess is that they aren’t RLed on large context as much, but that’s just a guess.
[0]: https://openai.com/index/instruction-hierarchy-challenge/