logoalt Hacker News

jzigyesterday at 4:16 PM1 replyview on HN

At what point along the 1M window does context become "long" enough that this degradation occurs?


Replies

daemonologistyesterday at 4:36 PM

The benchmark GP mentioned is measuring at 128k-256k context (there's another at 524k-1024k, where 4.6 scored 78.3% and 4.7 scored 32.2%).

The longer the context the worse the performance; there isn't really a qualitative step change in capability (if there is imo it happens at like 8k-16k tokens, much sooner than is relevant for multi-turn coding tasks - see e.g. this old benchmark https://github.com/adobe-research/NoLiMa ).