logoalt Hacker News

doctobogganlast Friday at 8:31 PM2 repliesview on HN

> A vector database for years of emails can easily exceed 50GB.

In 2025 I would consider this a relatively meager requirement.


Replies

andylizflast Friday at 8:54 PM

Yeah, that's a fair point at first glance. 50GB might not sound like a huge burden for a modern SSD.

However, the 50GB figure was just a starting point for emails. A true "local Jarvis," would need to index everything: all your code repositories, documents, notes, and chat histories. That raw data can easily be hundreds of gigabytes.

For a 200GB text corpus, a traditional vector index can swell to >500GB. At that point, it's no longer a "meager" requirement. It becomes a heavy "tax" on your primary drive, which is often non-upgradable on modern laptops.

The goal for practical local AI shouldn't just be that it's possible, but that it's also lightweight and sustainable. That's the problem we focused on: making a comprehensive local knowledge base feasible without forcing users to dedicate half their SSD to a single index.

show 5 replies
snomanlast Saturday at 4:27 PM

Take whatever you're indexing and make it 16-20x and that’s a good approximation of what the vector db’s total size is going to be.

show 1 reply