logoalt Hacker News

ANN v3: 200ms p99 query latency over 100B vectors

92 pointsby _peregrine_last Wednesday at 7:58 PM42 commentsview on HN

Comments

jascha_englast Thursday at 12:38 AM

This is legitimately pretty impressive. I think the rule of thumb is now, go with postgres(pgvector) for vector search until it breaks, then go with turbopuffer.

show 3 replies
mmaundertoday at 3:04 PM

For those of us who operate on site, we have to add back network latency, which negates this win entirely and makes a proprietary cloud solution like this a nonstarter.

show 1 reply
kgeisttoday at 2:31 PM

Are there vector DBs with 100B vectors in production which work well? There was a paper which showed that there's 12% loss in accuracy at just 1 mln vectors. Maybe some kind of logical sharding is another option, to improve both accuracy and speed.

show 3 replies
lmeyerovtoday at 2:12 PM

Fun!

I was curious given the cloud discussion - a quick search suggests default AWS SSD bandwidth is 250 MB/s, and you can pay more for 1 GB/s. Similar for s3, one http connection is < 100 MB/s, and you can pay for more parallel connections. So the hot binary quantized search index is doing a lot of work to minimize these both for the initial hot queries and pruning later fetches. Very cool!

alanwlitoday at 6:24 PM

Out of curiosity, how is the 92% recall calculated? For a given query, is the recall compared to the true topk of all 100B vectors vs. recall at each of N shards compared to the topk of each respective shard?

show 1 reply
montrosertoday at 5:08 PM

This is at 92% recall. Could be worse, but could definitely be much better. Quantization and hierarchical clustering are tricks that lead to awesome performance at the cost of extremely variable quality, depending on the dataset.

hwspeedtoday at 8:51 PM

The offline/local dev point is underrated. Being able to iterate without network latency or metered API costs makes a huge difference for prototyping. The challenge is making sure your local setup actually matches prod behavior. I've been burned by pgvector working fine locally then hitting performance cliffs at scale when the index doesn't fit in memory anymore.

vander_elsttoday at 8:22 PM

> 504MiB shared L3 cache

What CPU are they using here?

show 1 reply
redskyluantoday at 4:36 PM

Using Hierarchical Clustering significantly reduces recall; this is a solution we used and abandoned three years ago.

shayonjtoday at 1:53 PM

v cool and impressive!

show 1 reply