has anyone done some simple latency profiling of gemini embedding vs open ai embedding api? seem like that api call is one of the biggest chunks of time in a simple rag setup.
In my experience the api call is trivial compared to the time taken for the LLM to compose the response.
In my experience the api call is trivial compared to the time taken for the LLM to compose the response.