logoalt Hacker News

cubefoxyesterday at 6:03 PM1 replyview on HN

Okay, then let's see whether we are going to see real linear architectures, like Gated DeltaNet or Mamba-3, in some larger models. I don't believe there is a "lower bound" which states that those can never get to (or exceed) the real-world performance of quadratic attention. (Perfect recall in unrealistic needle-in-haystack tests doesn't count.)


Replies

andy12_yesterday at 10:07 PM

I'm also sure that some kind of linear architecture is possible. After all, humans don't have N^2 perfect recall either.