Thanks! So of those 3, despite the superficially "large" distances, 2 of the 3 are just ...

gojomo • yesterday at 5:06 AM • 1 reply • view on HN

Thanks!

So of those 3, despite the superficially "large" distances, 2 of the 3 are just as good at this particular analogy as Google's 2013 word2vec vectors, in that 'queen' is the closest word to the target, when query-words ('king', 'woman', 'man') are disqualified by rule.

But also: to really mimic the original vector-math and comparison using L2 distances, I believe you might need to leave the word-vectors unnormalized before the 'king'-'man'+'woman' calculation – to reflect that the word-vectors' varied unnormalized magnitudes may have relevant translational impact – but then ensure the comparison of the target-vector to all candidates is between unit-vectors (so that L2 distances match the rank ordering of cosine-distances). Or, just copy the original `word2vec.c` code's cosine-similarity-based calculations exactly.

Another wrinkle worth considering, for those who really care about this particular analogical-arithmetic exercise, is that some papers proposed simple changes that could make word2vec-era (shallow neural network) vectors better for that task, and the same tricks might give a lift to larger-model single-word vectors as well.

For example:

- Levy & Goldberg's "Linguistic Regularities in Sparse and Explicit Word Representations" (2014), suggesting a different vector-combination ("3CosMul")

- Mu, Bhat & Viswanath's "All-but-the-Top: Simple and Effective Postprocessing for Word Representations" (2017), suggesting recentering the space & removing some dominant components

Replies

jdthedisciple • yesterday at 7:21 AM

Interesting papers, thanks.

> you might need to leave the word-vectors unnormalized before the 'king'-'man'+'woman' calculation – to reflect that the word-vectors' varied unnormalized magnitudes may have relevant translational impact

I believe translation should be scale-invariant, and scale should not affect rank ordering

➕ show 1 reply

alt Hacker News

Replies