logoalt Hacker News

mnickylast Thursday at 1:08 AM2 repliesview on HN

IIRC isn't the symmetry between Q and K also broken by the direction of the softmax? I mean, row vs column-wise application yields different interpretation.


Replies

ebonnafouxlast Thursday at 7:44 AM

Yes but in practice, if you compute K=X.wk, Q=X.wq and then K.tQ you make three matrice multiplication. Wouldn't be faster to compute W=wk.twq beforhand and then just X.W.tX which will be just two matrices multiplication ? Is there something I am missing ?

show 1 reply
libraryofbabellast Thursday at 1:11 AM

Oh yes! That's probably more important, in fact.

show 1 reply