Only thing it buys you is a more “natural” embedding, i.e. the encoder can get you a bag o’ floats representing a text, but that also doesn’t mean it’s naturally a good embedding engine - I strongly assume you’d do further training.
Decoder gets you the autoregressive generation you’d use for an llm.
Beyond that, there’s this advantage of having small LLMs train better, they kinda hit a wall a year or two ago IMHO. E.g. original Gemma 3 small models were short context and only text.
As far as I understand you have to pay for that by 2x inference cost at runtime
(Would be happy to be corrected on any of the above, I maintain a multi platform app that has llama.cpp inference in addition to standard LLMs, and I do embeddings locally, so I’m operating from a practical understanding more than ML phd)
In general encoder+decoder models are much more efficient at infererence than decoder-only models because they run over the entire input all at once (which leverages parallel compute more effectively).
The issue is that they're generally harder to train (need input/output pairs as a training dataset) and don't naturally generalize as well