> It needs to check if that value is less than that for every other alternative to 'queen&#x...

jdthedisciple • yesterday at 9:59 AM • 1 reply • view on HN

> It needs to check if that value is less than that for every other alternative to 'queen'.

There you go: Closest 3 words (by L2) to the output vector for the following models, out of the most common 2265 spoken English words among which is also "queen":

    voyage-3-large:             king (0.46), woman (0.47), young (0.52), ... queen (0.56)
    ollama-qwen3-embedding:4b:  king (0.68), queen (0.71), woman (0.81)
    text-embedding-3-large:     king (0.93), woman (1.08), queen (1.13)

All embeddings are normalized to unit length, therefore L2 dists are normalized.

Replies

gojomo • today at 5:06 AM

Thanks!

So of those 3, despite the superficially "large" distances, 2 of the 3 are just as good at this particular analogy as Google's 2013 word2vec vectors, in that 'queen' is the closest word to the target, when query-words ('king', 'woman', 'man') are disqualified by rule.

But also: to really mimic the original vector-math and comparison using L2 distances, I believe you might need to leave the word-vectors unnormalized before the 'king'-'man'+'woman' calculation – to reflect that the word-vectors' varied unnormalized magnitudes may have relevant translational impact – but then ensure the comparison of the target-vector to all candidates is between unit-vectors (so that L2 distances match the rank ordering of cosine-distances). Or, just copy the original `word2vec.c` code's cosine-similarity-based calculations exactly.

Another wrinkle worth considering, for those who really care about this particular analogical-arithmetic exercise, is that some papers proposed simple changes that could make word2vec-era (shallow neural network) vectors better for that task, and the same tricks might give a lift to larger-model single-word vectors as well.

For example:

- Levy & Goldberg's "Linguistic Regularities in Sparse and Explicit Word Representations" (2014), suggesting a different vector-combination ("3CosMul")

- Mu, Bhat & Viswanath's "All-but-the-Top: Simple and Effective Postprocessing for Word Representations" (2017), suggesting recentering the space & removing some dominant components

➕ show 1 reply

alt Hacker News

Replies