logoalt Hacker News

rar00yesterday at 12:33 PM1 replyview on HN

This argument works better for state space models. A transformer would still steps context one token at a time, not maintain an internal 1e18 state.


Replies

mgraczykyesterday at 4:52 PM

That doesn't matter, are you familiar with any theoretical results in which the computation is somehow limited in ways that practically matter when the context length is very long? I am not