logoalt Hacker News

pjottoday at 1:51 AM1 replyview on HN

I did this but used duckdb as the vector store. Works really well, quite fast too.

https://github.com/patricktrainer/duckdb-embedding-search


Replies

jiggawattstoday at 2:26 AM

Unless I'm missing something, this uses a simple synchronous for loop:

    for text in texts:
        key = (text, model)
        if key not in pickle_cache:
            pickle_cache[key] = openai_client.create_embedding(text, model=model)
        embeddings.append(pickle_cache[key])
    operations.save_pickle_cache(pickle_cache, pickle_path)
    return embeddings
At the throughput rates I was seeing of one embedding per second, a million comments would take over a week to process!

I had to call the Gemini model with ten comments at a time from eight threads to reach even the paltry 3K rpm rate limit they offer to "Tier 1" customers.

Based on this experience, for real "enterprise" customers I might implement a generic wrapper for Google's Batch API that could handle continuous streaming from a database, chunking it, uploading, and then in parallel checking the status of the pending jobs and streaming the results back into a database.

show 2 replies