We evaluate many things that you alluded to, such as speed on device, output correctness, and also "is this something that would be useful" the last one being a bit abstract.
The way we think about it is what do we think developers and users need, and is there a way we can fill that gap in a useful way. With this model we had the hypothesis you had, there are fantastic larger models out there pushing the frontier of AI capabilities, but there's also a nice for smaller customizable model that's quick to run and quick to tune.
What is optimal then ultimately falls to you and your use cases (which I'm guessing at here), you have options now between Gemini and Gemma.
Thanks - yeah, I'm capable of assessing for my own use cases. I guess I was trying to muse out-loud about whether there's a useful benchmark to be made or published out of these assessments. There are a number of architectures where there's a 'fast loop' and then a slow loop. Robotics comes to mind. I think training the ability to be like 'uh oh, better get over to slow good thinking' into the fast loop models is likely to be super useful.