A decoder predicts the next word (token) to iteratively generate a whole sentence. An encoder masks a word in the middle of a sentence and tries to predict that middle.
The original transformer paper from google was encoder-decoder, but then encoder BERT was hot and then decoder GPT was hot; now encoder-decoder is hot again!
Decoders are good at generative tasks - chatbots etc.
Encoders are good at summaration.
Encoder decoders are better at summaration. It’s steps towards “understanding” (quotes needed).