Thank you! I actually had a hard time finding prior work on this, so I appreciate the references. ...

nxa • 05/14/2025 • 1 reply • view on HN

Thank you! I actually had a hard time finding prior work on this, so I appreciate the references.

The dictionary is based on https://wordnet.princeton.edu/, no word2vec. It's just a plain lookup among precomputed embeddings (with mxbai-embed-large). And yes, I'm excluding words that are present in the query because.

It would be interesting to see how other models perform. I tried one (forgot the name) that was focused on coding, and it didn't perform nearly as well (in terms of human joy from the results).

Replies

kaycebasques • 05/14/2025

(Question for anyone) how could I go about replicating this with Gemini Embedding? Generate and store an embedding for every word in the dictionary?

➕ show 1 reply

alt Hacker News

Replies