this seems to be similar to gpt-pro, they just have a very large attention window (which is why it&#...

himata4113 • today at 6:58 PM • 2 replies • view on HN

this seems to be similar to gpt-pro, they just have a very large attention window (which is why it's so expensive to run) true attention window of most models is 8096 tokens.

Replies

appcustodian2 • today at 8:40 PM

source on the 8096 tokens number? i'm vaguely aware that some previous models attended more to the beginning and end of conversations which doesn't seem to fit a simple contiguous "attention window" within the greater context but would love to know more

thegeomaster • today at 7:37 PM

What's the "attention window"? Are you alleging these frontier models use something like SWA? Seems highly unlikely.

alt Hacker News

Replies