> self-attention is efficiently computable to arbitrary precision with constant cost per token ...

kristjansson • today at 4:15 PM • 2 replies • view on HN

> self-attention is efficiently computable to arbitrary precision with constant cost per token

This paper at least aspires to reproduce 'true' attention, which distinguishes it from many of the others. TBD if its successful in that.

Replies

logicchains • today at 4:47 PM

It can't be successful at that any more than 1+1 can equal 3. Fundamentally, if every token wants to be able to look at every previous token without loss of information, it must be O(n^2); N tokens looking at N tokens is quadratic. Any sub-quadratic attention must hence necessarily lose some information and be unable to support perfect recall on longer sequences.

➕ show 4 replies

energy123 • today at 4:20 PM

It's like claims of room temperature superconductors or millenium prize solutions. Earth shattering if true. It'd be such a black swan. Terrible for Nvidia.

➕ show 1 reply

alt Hacker News

Replies