We are always figuring out what parameter size makes sense.
The decision is always a mix between how good we can make the models from a technical aspect, with how good they need to be to make all of you super excited to use them. And its a bit of a challenge what is an ever changing ecosystem.
I'm personally curious is there a certain parameter size you're looking for?
120B would be great to have if you have it stashed away somewhere. GPT-OSS-120B still stands as one of the best (and fastest) open-weights models out there. A direct competitor in the same size range would be awesome. The closest recent release was Qwen3.5-122B-A10B.
Jeff Dean apparently didn't get the message that you weren't releasing the 124B Moe :D
Was it too good or not good enough? (blink twice if you can't answer lol)
Mainline consumer cards are 16GB, so everyone wants models they can run on their $400 GPU.
My sweet spot is something that runs on less than 128gb.
(I have a DGX Spark, and MBP w/ 128gb).
I'll pipe in - a series of Mac optimized MOEs which can stream experts just in time would be really amazing. And popular; I'm guessing in the next year we'll be able to run a very able openclaw with a stack like that. You'll get a lot of installs there. If I were a PM at Gemma, I'd release a stack for each Mac mini memory size.
Something in the 60B to 80B range would still be approachable for most people running local models and also could give improved results over 31B.
Also, as I understand it the 26B is the MOE and the 31B is dense - why is the larger one dense and the smaller one MOE?
how good they need to be to make all of you super excited to use them
Isn't that more dictated by the competition you're facing from Llama and Qwent?
For the many DGX Spark and Strix Halo users with 128GB of memory, I believe the ideal model size would probably be a MoE with close to 200B total parameters and a low active count of 3B to 10B.
I would personally love to see a super sparse 200B A3B model, just to see what is possible. These machines don't have a lot of bandwidth, so a low active count is essential to getting good speed, and a high total parameter count gives the model greater capability and knowledge.
It would also be essential to have the Q4 QAT, of course. Then the 200B model weights would take up ~100GB of memory, not including the context.
The common 120B size these days leaves a lot of unused memory on the table on these machines.
I would also like the larger models to support audio input, not just the E2B/E4B models. And audio output would be great too!