Unless I'm missing something, this uses a simple synchronous for loop:
for text in texts:
key = (text, model)
if key not in pickle_cache:
pickle_cache[key] = openai_client.create_embedding(text, model=model)
embeddings.append(pickle_cache[key])
operations.save_pickle_cache(pickle_cache, pickle_path)
return embeddings
At the throughput rates I was seeing of one embedding per second, a million comments would take over a week to process!
I had to call the Gemini model with ten comments at a time from eight threads to reach even the paltry 3K rpm rate limit they offer to "Tier 1" customers.
Based on this experience, for real "enterprise" customers I might implement a generic wrapper for Google's Batch API that could handle continuous streaming from a database, chunking it, uploading, and then in parallel checking the status of the pending jobs and streaming the results back into a database.
Unless I'm missing something, this uses a simple synchronous for loop:
At the throughput rates I was seeing of one embedding per second, a million comments would take over a week to process!I had to call the Gemini model with ten comments at a time from eight threads to reach even the paltry 3K rpm rate limit they offer to "Tier 1" customers.
Based on this experience, for real "enterprise" customers I might implement a generic wrapper for Google's Batch API that could handle continuous streaming from a database, chunking it, uploading, and then in parallel checking the status of the pending jobs and streaming the results back into a database.