(I work at OpenRouter) We add about 15ms of latency once the cache is warm (e.g. on subsequent requests) -- and if there are reliability problems, please let us know! OpenRouter should be more reliable as we will load balance and fall back between different Gemini endpoints.
Is batch mode on the roadmap? As the frontier model providers start to think more and more about profitability, and prices/latencies rise as a result, I can see batching becoming more and more necessary for many use cases.
Would love to know we can build against the OpenAI batch API and (soon?) have a path towards being model-agnostic.