logoalt Hacker News

sigmoid10last Thursday at 10:59 PM1 replyview on HN

They trained it to be used like any other decoder only model. So text generation essentially. But you could use the encoder part for things like classification without much effort. Then again you can also slap a classifier head on any decoder model. The main reason they seem to be doing this is to have swappable encoder/decoder parts in an otherwise standard LLM. But I'm not sure if that is really something we needed.


Replies

VHRangeryesterday at 12:59 AM

Encoder/decoder is much, much more efficient for finetuning and inference than decoder-only models.

Historically T5 are good when you finetune them for task specific models (translation, summarization, etc).

show 1 reply