> I believe translation should be scale-invariant, and scale should not affect rank ordering
I don't believe this is true with regard to ending angles after addition steps between vectors of varying magnitudes.
Imagine just in 2D: vector A at 90° & magnitude 1.0, vector B at 0° & magnitude 0.5, and vector B' at 0° but normalized to magnitude 1.0.
The vectors (A+B) and (A+B') will be at both different magnitudes and different directions.
Thus, cossim(A,(A+B')) will be notably less than cossim(A,(A+B)), and more generally, if imagining the whole unit circles as filled with candidate nearest-neighbors, (A+B) and (A+B') may have notably different ranked lists of cosine-similarity nearest-neighbors.
You are totally right of course!
It had slipped my (tired) mind that vector magnitudes are actually discarded in embedding model training.