You already need very high end hardware to run useful local LLMs, I don't know if a 200gb vector database will be the dealbreaker in that scenario. But I wonder how small you could get it with compression and quantization on top
I've worked in other domains my whole career, so I was astonished this week when we put a million 768-len embeddings into a vector db and it was only a few GB. Napkin math said ~25 GB and intuition said a long list of widely distributed floats would be fairly uncompressable. HNSW is pretty cool.
You can already do A LOT with an SLM running on commodity consumer hardware. Also it's important to consider that the bigger an embedding is, the more bandwidth you need to use it at any reasonable speed. And while storage may be "cheap", memory bandwidth absolutely is not.
> You already need very high end hardware to run useful local LLMs
A basic macbook can run gpt-oss-20b and it's quite useful for many tasks. And fast. Of course Macs have a huge advantage for local LLMs inference due to their shared memory architecture.
The mid-spec 2025 iPhone can run “useful local LLMs” yet has 256GB of total storage.
(Sure, this is a spec distortion due to Apple’s market-segmentation tactics, but due to the sheer install-base, it’s still a configuration you might want to take into consideration when talking about the potential deployment-targets for this sort of local-first tech.)
I'm no dev either and still set up remote ssh login to be able to use LaTeX at home PC from my laptop.
Also, with many games and dual boot on my gaming PC I still have some space left on my 2TB NVME SSD. And my not enthusiast MOBO could fit two more.
It took so much time to install LaTeX and packages, and also so much space, my 128GB drive couldn't handle it.