logoalt Hacker News

keizo08/01/20251 replyview on HN

has anyone done some simple latency profiling of gemini embedding vs open ai embedding api? seem like that api call is one of the biggest chunks of time in a simple rag setup.


Replies

elliotto08/01/2025

In my experience the api call is trivial compared to the time taken for the LLM to compose the response.

show 1 reply