logoalt Hacker News

potatoman22last Thursday at 10:54 PM2 repliesview on HN

What's the use case of models like T5 compared to decoder-only models like Gemma? More traditional ML/NLP tasks?


Replies

sigmoid10last Thursday at 10:59 PM

They trained it to be used like any other decoder only model. So text generation essentially. But you could use the encoder part for things like classification without much effort. Then again you can also slap a classifier head on any decoder model. The main reason they seem to be doing this is to have swappable encoder/decoder parts in an otherwise standard LLM. But I'm not sure if that is really something we needed.

show 1 reply
refulgentislast Thursday at 11:04 PM

Only thing it buys you is a more “natural” embedding, i.e. the encoder can get you a bag o’ floats representing a text, but that also doesn’t mean it’s naturally a good embedding engine - I strongly assume you’d do further training.

Decoder gets you the autoregressive generation you’d use for an llm.

Beyond that, there’s this advantage of having small LLMs train better, they kinda hit a wall a year or two ago IMHO. E.g. original Gemma 3 small models were short context and only text.

As far as I understand you have to pay for that by 2x inference cost at runtime

(Would be happy to be corrected on any of the above, I maintain a multi platform app that has llama.cpp inference in addition to standard LLMs, and I do embeddings locally, so I’m operating from a practical understanding more than ML phd)

show 1 reply