You must have missed the “at scale” part. There is nothing inexpensive about extra network hops, cache misses, and page faults implied by your solution. Indexing at scale is almost always lossy for performance reasons. The location where you insert a new record is frequently not the same location as where you have to search for an existing record.
It is resource amplification all the way down. In a lot of systems that index these keys the cost of that check is several times that of doing a blind insert.
You must have missed the “at scale” part. There is nothing inexpensive about extra network hops, cache misses, and page faults implied by your solution. Indexing at scale is almost always lossy for performance reasons. The location where you insert a new record is frequently not the same location as where you have to search for an existing record.
It is resource amplification all the way down. In a lot of systems that index these keys the cost of that check is several times that of doing a blind insert.