Very cool! I was wondering, is a separate model performing speech recognition for the voice demos su...

lukeinator42 • last Thursday at 7:52 PM • 1 reply • view on HN

Very cool! I was wondering, is a separate model performing speech recognition for the voice demos such as the game? The FunctionGemma model card only seems to show text input/output.

Replies

canyon289 • last Thursday at 8:41 PM

Yes a separate model is performing ASR in this case. Gemma270m (base, function, and others) are not multimodal out of the box.

That being said if someone in the community wanted to use other encoders like siglip and plug them into Gemma270m to make it multimodal that'd be a great way to have fun over break and build up an AI Eegineer resume :)

alt Hacker News

Replies