T5Gemma 2: The next generation of encoder-decoder models

111 points • by milomg • yesterday at 7:48 PM • 20 comments • view on HN

Comments

> Note: we are not releasing any post-trained / IT checkpoints.

I get not trying to cannibalize Gemma, but that's weird. A 540M multimodel model that performs well on queries would be useful and "just post-train it yourself" is not always an option.

➕ show 2 replies

killerstorm • today at 12:45 AM

They are comparing 1B Gemma to 1+1B T5Gemma 2. Obviously a model with twice more parameters can do more better. Says absolutely nothing about benefits of the architecture.

➕ show 1 reply

o1inventor • today at 1:34 AM

> 128k context.

don't care. prove effective context length or gtfo.

davedx • yesterday at 9:06 PM

What is an encoder-decoder model, is it some kind of LLM, or a subcomponent of an LLM?

➕ show 4 replies

DoctorOetker • yesterday at 11:38 PM

What is the "X" in the pentagonal performance comparison, is it multilingual performance or something else?

potatoman22 • yesterday at 10:54 PM

What's the use case of models like T5 compared to decoder-only models like Gemma? More traditional ML/NLP tasks?

➕ show 2 replies

alt Hacker News

T5Gemma 2: The next generation of encoder-decoder models

Comments