logoalt Hacker News

riku_ikilast Sunday at 4:23 PM1 replyview on HN

Post starts with wrong statement right away:

"The Transformer architecture revolutionized sequence modeling with its introduction of attention"

Attention was developed before transformers.


Replies

Alifatisklast Sunday at 10:35 PM

> Attention was developed before transformers.

I just looked this up and it’s true, this changes the timeline I had in my mind completely! I thought the paper on Transformers is what also introduced the attention mechanism, but it existed before too and was applied on RNN encoder-decoder. Wow

show 1 reply