data + plural = number data - plural = research king - crown = (didn't work...

godelski • 05/14/2025 • 9 replies • view on HN

  data + plural = number
  data - plural = research
  king - crown = (didn't work... crown gets circled in red)
  king - princess = emperor
  king - queen = kingdom
  queen - king = worker
  king + queen = queen + king = kingdom
  boy + age = (didn't work... boy gets circled in red)
  man - age = woman
  woman - age = newswoman
  woman + age = adult female body (tied with man)
  girl + age = female child
  girl + old = female child

The other suggestions are pretty similar to the results I got in most cases. But I think this helps illustrate the curse of dimensionality (i.e. distances are ill-defined in high dimensional spaces). This is still quite an unsolved problem and seems a pretty critical one to resolve that doesn't get enough attention.

Replies

n2d4 • 05/14/2025

For fun, I pasted these into ChatGPT o4-mini-high and asked it for an opinion:

   data + plural    = datasets
   data - plural    = datum
   king - crown     = ruler
   king - princess  = man
   king - queen     = prince
   queen - king     = woman
   king + queen     = royalty
   boy + age        = man
   man - age        = boy
   woman - age      = girl
   woman + age      = elderly woman
   girl + age       = woman
   girl + old       = grandmother

The results are surprisingly good, I don't think I could've done better as a human. But keep in mind that this doesn't do embedding math like OP! Although it does show how generic LLMs can solve some tasks better than traditional NLP.

The prompt I used:

> Remember those "semantic calculators" with AI embeddings? Like "king - man + woman = queen"? Pretend you're a semantic calculator, and give me the results for the following:

➕ show 5 replies

mathgradthrow • 05/14/2025

Distance is extremely well defined in high dimensional spaces. That isn't the problem.

➕ show 1 reply

Affric • 05/14/2025

Yeah I did similar tests and got similar results.

Curious tool but not what I would call accurate.

gweinberg • 05/14/2025

I got a bunch of red stuff also. I imagine the author cached embeddings for some words but not really all that many to save on credits. I gave it mermaid - woman and got merman, but when I tried to give it boar + woman - man or ram + woman - man, it turns out it has never heard of rams or boars.

thatguysaguy • 05/14/2025

Can you elaborate on what the unsolved problem you're referring to is?

➕ show 1 reply

sdeframond • 05/15/2025

Such results are inherently limited because a same word can have different meanings depending on context.

The role of the Attention Layer in LLMs is to give each token a better embedding by accounting for context.

charlieyu1 • 05/16/2025

I think you need to do A-B+C types? A+B or A-B wouldn’t make much sense when the magnitude changes

virgilp • 05/15/2025

hacker+news-startup = golfer

pjc50 • 05/15/2025

Ah yes, 女 + 子 = girl but if combined in a kanji you get 好 = like.

alt Hacker News

Replies