logoalt Hacker News

tgvyesterday at 6:06 PM1 replyview on HN

There are more differences between English and French than you just described, and they can affect your measurement. Even the corpora you use cannot be the same. There isn't "ceteris paribus" (holding everything else constant). The outcome of the experiment doesn't say anything about the hypothesis.

You're also going to use an artificial neural network to make claims about the human brain? That distance is too large to bridge with a few assumptions.

BTW, nobody believes our language faculties are doing the thinking. There are however, obviously, connections to thought: not only the concepts/meaning, but possibly sharing neural structures, such as the feedback mechanism that allows us to monitor ourselves.

I have a slightly better proposal: if you want to see the effect of gender, genderize English or neutralize French, and compare both versions of the same language. Careful with tokenization, though.


Replies

adamzwassermanyesterday at 8:16 PM

The confound concern is fair: no cross-linguistic comparison is perfectly controlled. The bet is that the effect size (if any) will be large enough to be informative despite the noise. But you're right that it's not ceteris paribus in a strict sense.

Your proposal is interesting though. Synthetic manipulation of morphology within a single language. Have you seen this done? The challenge I'd anticipate is that "genderized English" wouldn't have natural text to train on, so you'd need to generate it somehow, which introduces its own artifacts. But comparing French vs artificially gender-neutralized French might be feasible with existing parallel corpora. Worth thinking about as a follow-up.

On the neural network → brain distance: agreed it's a leap. The claim isn't that transformers are brains, but that if both are extracting structure from language, they might reveal something about what structure is there to extract. Fedorenko's own comparison to "early LLMs" suggests she thinks the analogy has some merit.