> Also, all this vector stuff is going to fade away as context windows get larger (already started over the past 8 months or so).
People who say this really have not thought this through, or simply don't understand what the usecases for vector search are.
But even if you had infinite context, with perfect attention, attention isn't free. Even if you had linear attention. It's much much cheaper to index your data than it is to reprocess everything. You don't go around scanning entire databases when you're just interested in row id=X
IMO for some things RAG works great, and for others you may need attention, and hence why the completely disparate experiences about RAG.
As an example, if one is chunking inputs into a RAG, one is basically hardcoding a feature based on locality - which may or may not work. If it works - as in, it is a good feature (the attention matrix is really tail-heavy - LSTMs would work, etc...) - then hey, vector DBs work beautifully. But for many things where people have trouble with RAG, the locality assumption is heavily violated - and there you _need_ the full-on attention matrix.