logoalt Hacker News

storuslast Thursday at 11:31 AM1 replyview on HN

QKV attention is just a probabilistic lookup table where QKV allow adjusting dimensions of input/output to fit into your NN block. If your Q perfectly matches some known K (from training) then you get the exact V otherwise you get some linear combination of all Vs weighted by the attention.


Replies

art_machlast Thursday at 2:21 PM

It's not, please read the thread above.

show 1 reply