I'd second this wholeheartedly
Since building a custom agent setup to replace copilot, adopting/adjusting Claude Code prompts, and giving it basic tools, gemini-3-flash is my go-to model unless I know it's a big and involved task. The model is really good at 1/10 the cost of pro, super fast by comparison, and some basic a/b testing shows little to no difference in output on the majority of tasks I used
Cut all my subs, spend less money, don't get rate limited
Plus I've found that overall with "thinking" models, it's more like for memory, not even actual perf boost, it might even be worse because if it goes even slightly wrong on the "thinking" part, it'll then commit to that for the actual response
LLM bubble will burst the second investors figure out how much well managed local model can do
[dead]
Yeah, one of my first projects one of my buddies asked "Why aren't you using [ChatGPT 4.0] nano? It's 99% the effectiveness with 10% the price."
I've been using the smaller models ever since. Nano/mini, flash, etc.