logoalt Hacker News

lukeinator42last Thursday at 7:52 PM1 replyview on HN

Very cool! I was wondering, is a separate model performing speech recognition for the voice demos such as the game? The FunctionGemma model card only seems to show text input/output.


Replies

canyon289last Thursday at 8:41 PM

Yes a separate model is performing ASR in this case. Gemma270m (base, function, and others) are not multimodal out of the box.

That being said if someone in the community wanted to use other encoders like siglip and plug them into Gemma270m to make it multimodal that'd be a great way to have fun over break and build up an AI Eegineer resume :)