You’re not searching over 500GB, you’re searching an index of the vectors. That’s the magic of embed...

brookst • last Saturday at 1:45 PM • 1 reply • view on HN

You’re not searching over 500GB, you’re searching an index of the vectors. That’s the magic of embeddings and vector databases.

Same way you might have a 50TB relational database but “select id, name from people where country=‘uk’ and name like ‘benj%’ might only touch a few MB of storage at most.

Replies

ricardobeat • last Saturday at 10:54 PM

That’s precisely the point I tried to clear up in the previous comment.

The LEANN author proposes to create a 9GB index for a 500GB archive, and the other poster argued that it is not helpful because “storage is cheap”.

alt Hacker News

Replies