I published a video that explains Self-Attention and Multi-head attention in a different way -- goin...

psaccounts • last Thursday at 7:40 PM • 0 replies • view on HN

I published a video that explains Self-Attention and Multi-head attention in a different way -- going from intuition, to math, to code starting from the end-result and walking backward to the actual method.

Hopefully this sheds light on this important topic in a way that is different than other approaches and provides the clarity needed to understand Transformer architecture. It starts at 41:22 in the below video.

https://youtu.be/6jyL6NB3_LI?t=2482

alt Hacker News