Okay, then let's see whether we are going to see real linear architectures, like Gated DeltaNet...

cubefox • yesterday at 6:03 PM • 1 reply • view on HN

Okay, then let's see whether we are going to see real linear architectures, like Gated DeltaNet or Mamba-3, in some larger models. I don't believe there is a "lower bound" which states that those can never get to (or exceed) the real-world performance of quadratic attention. (Perfect recall in unrealistic needle-in-haystack tests doesn't count.)

Replies

andy12_ • yesterday at 10:07 PM

I'm also sure that some kind of linear architecture is possible. After all, humans don't have N^2 perfect recall either.

alt Hacker News

Replies