logoalt Hacker News

jiggawattstoday at 12:50 AM1 replyview on HN

I could have used this just yesterday!

I've been evaluating Gemini Embedding 2 using Hacker News comments and I wasted half a day making a wrapper for the HN API to collect some sample data to play with.

In case anyone is curious:

- The ability to simply truncate the provided embedding to a prefix (and then renormalize) is useful because it lets users re-use the same (paid!) embedding API response for multiple indexes at different qualities.

- Traditional enterprise software vendors are struggling to keep up with the pace of AI development. Microsoft SQL Server for example can't store a 3072 element vector with 32-bit floats (because that would be 12 KB and the page size is only 8 KB). It supports bfloat16 but... the SQL client doesn't! Or Entity Framework. Or anything else.

- Holy cow everything is so slow compared to full text search! The model is deployed in only one US region, so from Australia the turnaround time is something like 900 milliseconds. Then the vector search over just a few thousand entries with DiskANN is another 600-800 ms! I guess search-as-you-type is out of the question for... a while.

- Speaking of slow, the first thing I had to do was write an asynchronous parallel bounded queue data processor utility class in C# that supports chunking of the input and rate limit retries. This feels like it ought to be baked into the standard library or at least the AI SDKs because it's pretty much mandatory if working with anything other than "hello world" scenarios.

- Gemini Embedding 2 has the headline feature of multi-modal input, but they forgot to implement anything other than "string" for their IEmbeddingGenerator abstraction when used with Microsoft libraries. I guess the next "Preview v0.0.3-alpha" version or whatever will include it.


Replies

pjottoday at 1:51 AM

I did this but used duckdb as the vector store. Works really well, quite fast too.

https://github.com/patricktrainer/duckdb-embedding-search

show 1 reply