> A vector database for years of emails can easily exceed 50GB.
In 2025 I would consider this a relatively meager requirement.
Take whatever you're indexing and make it 16-20x and that’s a good approximation of what the vector db’s total size is going to be.
Yeah, that's a fair point at first glance. 50GB might not sound like a huge burden for a modern SSD.
However, the 50GB figure was just a starting point for emails. A true "local Jarvis," would need to index everything: all your code repositories, documents, notes, and chat histories. That raw data can easily be hundreds of gigabytes.
For a 200GB text corpus, a traditional vector index can swell to >500GB. At that point, it's no longer a "meager" requirement. It becomes a heavy "tax" on your primary drive, which is often non-upgradable on modern laptops.
The goal for practical local AI shouldn't just be that it's possible, but that it's also lightweight and sustainable. That's the problem we focused on: making a comprehensive local knowledge base feasible without forcing users to dedicate half their SSD to a single index.