Isn't finetuning the point of the T5 style models, since they perform better for smaller parame...

jeffjeffbear • last Thursday at 9:11 PM • 1 reply • view on HN

Isn't finetuning the point of the T5 style models, since they perform better for smaller parameter counts?

Replies

refulgentis • last Thursday at 10:58 PM

It’ll be a major pain in the ass to replicate exactly what they did to make it long context and multimodal. Sucks too because the smol Gemma 3s with same parameter count were neither.

➕ show 1 reply

alt Hacker News

Replies