Interested to hear more about your experience here. At Halcyon, we have trillions of embeddings and found Postgres to be unsuitable at several orders of magnitude less than we currently have.
On the iterative scan side, how do you prevent this from becoming too computationally intensive with a restrictive pre-filter, or simply not working at all? We use Vespa, which means effectively doing a map-reduce across all of our nodes; the effective number of graph traversals to do is smaller, and the computational burden mostly involves scanning posting lists on a per-node basis. I imagine to do something similar in postgres, you'd need sharded tables, and complicated application logic to control what you're actually searching.
How do you deal with re-indexing and/or denormalizing metadata for filtering? Do you simply accept that it'll take hours or days?
I agree with you, however, that vector databases are not a panacea (although they do remove a huge amount of devops work, which is worth a lot!). Vespa supports filtering across parent-child relationships (like a relational database) which means we don't have to reindex a trillion things every time we want to add a new type of filter, which with a previous vector database vendor we used took us almost a week.
We host thousands of forums but each one has its own database, which means we get a sort of free sharding of the data where each instance has less than a million topics on average.
I can totally see that at a trillion scale for a single shard you want a specialized dedicated service, but that is also true for most things in tech when you get to the extreme scale .