Good point! Maybe indexing is a bad term here, and it's more like feature extraction (and since...

iezepov • 08/09/2025 • 1 reply • view on HN

Good point! Maybe indexing is a bad term here, and it's more like feature extraction (and since embeddings are high dimensional we extract a lot of features). From that point of view it makes sense that "the index" takes more space than the original data.

Replies

catlifeonmars • 08/09/2025

Why would the embeddings be higher dimensionally than the data? I imagine the embeddings would contain relatively higher entropy (and thus lower redundancy) than many types of source data.

➕ show 1 reply

alt Hacker News

Replies