Also worth mentioning that we use quantization extensively: - halfvec (16bit float) for storage - ...

xfalcox • yesterday at 3:25 PM • 3 replies • view on HN

Also worth mentioning that we use quantization extensively:

- halfvec (16bit float) for storage - bit (binary vectors) for indexes

Which makes the storage cost and on-going performance good enough that we could enable this in all our hosting.

Replies

It still amazes me that the binary trick works.

For anyone who hasn't seen it yet: it turns out many embedding vectors of e.g. 1024 floating point numbers can be reduced to a single bit per value that records if it's higher or lower than 0... and in this reduced form much of the embedding math still works!

This means you can e.g. filter to the top 100 using extremely memory efficient and fast bit vectors, then run a more expensive distance calculation against those top 100 with the full floating point vectors to pick the top 10.

➕ show 4 replies

summarity • yesterday at 4:35 PM

That's where it's at. I'm using the 1600D vectors from OpenAI models for findsight.ai, stored SuperBit-quantized. Even without fancy indexing, a full scan (1 search vector -> 5M stored vectors), takes less than 40ms. And with basic binning, it's nearly instant.

➕ show 1 reply

mfrye0 • today at 12:17 AM

I was going to say the same. We're using binary vectors in prod as well. Makes a huge difference in the indexes. This wasn't mentioned once in the article.

alt Hacker News

Replies