Nice, I tried to writeup a simpler explanation for LLM a few days back too @ https://kaamvaam.com/machine-learning-ai/llm-attention-expla... One thing that stumped for a bit is the need for matrix V.