logoalt Hacker News

jeffjeffbearlast Thursday at 9:11 PM1 replyview on HN

Isn't finetuning the point of the T5 style models, since they perform better for smaller parameter counts?


Replies

refulgentislast Thursday at 10:58 PM

It’ll be a major pain in the ass to replicate exactly what they did to make it long context and multimodal. Sucks too because the smol Gemma 3s with same parameter count were neither.

show 1 reply