You can make it even simpler and not bother with any of this. With even something as large as 100M vectors, you can just use Torch or GGUF with compression. Even NumPy can take you a long way. Example below.
https://github.com/neuml/txtai/blob/master/examples/78_Acces...