It’ll be a major pain in the ass to replicate exactly what they did to make it long context and mult...

refulgentis • last Thursday at 10:58 PM • 1 reply • view on HN

It’ll be a major pain in the ass to replicate exactly what they did to make it long context and multimodal. Sucks too because the smol Gemma 3s with same parameter count were neither.

Replies

jeffjeffbear • last Thursday at 11:35 PM

> https://huggingface.co/google/t5gemma-2-1b-1b

From here it looks like it still is long context and multimodal though?

>Inputs and outputs Input:

Text string, such as a question, a prompt, or a document to be summarized

Images, normalized to 896 x 896 resolution and encoded to 256 tokens each

Total input context of 128K tokens Output:

Generated text in response to the input, such as an answer to a question, analysis of image content, or a summary of a document

Total output context up to 32K tokens

➕ show 1 reply

alt Hacker News

Replies