From the article, speed & cost match 2.5 Flash. I'm working on a project where there's a huge gap between 2.5 Flash and 2.5 Flash Lite as far as performance and cost goes.
-> 2.5 Flash Lite is super fast & cheap (~1-1.5s inference), but poor quality responses.
-> 2.5 Flash gives high quality responses, but fairly expensive & slow (5-7s inference)
I really just need an in-between for Flash and Flash Lite for cost and performance. Right now, users have to wait up to 7s for a quality response.