In general encoder+decoder models are much more efficient at infererence than decoder-only models be...

VHRanger • yesterday at 1:01 AM • 1 reply • view on HN

In general encoder+decoder models are much more efficient at infererence than decoder-only models because they run over the entire input all at once (which leverages parallel compute more effectively).

The issue is that they're generally harder to train (need input/output pairs as a training dataset) and don't naturally generalize as well

Replies

GaggiX • yesterday at 4:39 AM

≥In general encoder+decoder models are much more efficient at infererence than decoder-only models because they run over the entire input all at once (which leverages parallel compute more effectively).

Decoder-only models also do this, the only difference is that they use a masked attention.

alt Hacker News

Replies