Adding to that excellent high level explanation of what the attention mechanism is, I’d add (from my reading of the abstract of this paper);
This work builds a model that has the ability to “remember” parts of its previous input when generating and processing new input, and has part of its intelligence devoted to determining what is relevant to remember.
This is in lieu of kind of saying “I need to keep re-reading what I’ve already read and said to keep going”.
Adding to that excellent high level explanation of what the attention mechanism is, I’d add (from my reading of the abstract of this paper);
This work builds a model that has the ability to “remember” parts of its previous input when generating and processing new input, and has part of its intelligence devoted to determining what is relevant to remember.
This is in lieu of kind of saying “I need to keep re-reading what I’ve already read and said to keep going”.
I’d welcome better explanations. :)