(I work at OpenRouter) We add about 15ms of latency once the cache is warm (e.g. on subsequent reque...

numlocked • last Thursday at 2:46 PM • 1 reply • view on HN

(I work at OpenRouter) We add about 15ms of latency once the cache is warm (e.g. on subsequent requests) -- and if there are reliability problems, please let us know! OpenRouter should be more reliable as we will load balance and fall back between different Gemini endpoints.

Replies

btown • last Thursday at 11:02 PM

Is batch mode on the roadmap? As the frontier model providers start to think more and more about profitability, and prices/latencies rise as a result, I can see batching becoming more and more necessary for many use cases.

Would love to know we can build against the OpenAI batch API and (soon?) have a path towards being model-agnostic.

alt Hacker News

Replies