Over the weekend I used the small models for experimental training runs when figuring out how to build LoRAs. It takes a lot less time to do smoke tests of the process on E2B vs the 31B version. And E4B was a reasonable stop along the line just to make sure the LoRA combined with the base model to produce coherent output.
Also, they're good enough for a lot of simple categorization and data extraction tasks, e.g. something like "flag abusive posts/comments", or "visit website, find the contact info, open hours, address". And they run fast on the kind of hardware you're likely to have at home, while the bigger dense versions decidedly do not.
I used Gemma 4 itself to review and prune the data (my social media posts over the last ~5 years, about 5 million words) being ingested into the training process for a LoRA for Gemma 4. I found the bigger model (31B) was more nuanced and useful than the smaller ones, and I wasn't in a big hurry by that stage of the process, so I used the big one overnight. Gemma 4 31B was also a better judge of my writing than Gemini Flash 2.5, by my reckoning.
It was, again, more nuanced, and was able to recognize a generally helpful comment that opened kinda jokey/rude, while the smaller model and Gemini 2.5 Flash tended to gravitate toward extremes (1 or 5) rather than the 1-5 scale they were prompted to rate on. I assume Gemini 3.1 Flash is probably competitive or better, but I didn't try it, since I liked the results the self-hosted Gemma 4 was giving for free.
The little ones also run great on very modest hardware. Both run at comfortable interactive speed mid-range tablets. E4B is blazing fast on an iPad M4 or Pixel 10 Pro and entirely usable on a midrange Android with sufficient RAM.