logoalt Hacker News

smallerizeyesterday at 12:06 AM1 replyview on HN

That never exactly worked for word2vec either. https://blog.esciencecenter.nl/king-man-woman-king-9a7fd2935...


Replies

kaycebasquesyesterday at 12:20 AM

From the linked article:

> The widely known example only works because the implementation of the algorithm will exclude the original vector from the possible results!

I saw this issue in the "same topic, different domain" experiment when using EmbeddingGemma with the default task types. But when using custom task types, the vector arithmetic worked as expected. I didn't have to remove the original vector from the results or control for that in any way. So while the criticism is valid for word2vec I'm skeptical that modern embedding models still have this issue.

Very curious to learn whether modern models are still better at some analogies (e.g. male/female) and worse at others, though. Is there any more recent research on that topic? The linked article is from 2019.