I’ve wondered about this more generally (ie, simply prompting in different languages).
For example, if I ask for a pasta recipe in Italian, will I get a more authentic recipe than in English?
I’m curious if anyone has done much experimenting with this concept.
Edit: I looked up Sapir-Whorf after writing. That’s not exactly where my theory started. I’m thinking more about vector embedding. I.e., the same content in different languages will end up with slightly different positions in vector space. How significantly might that influence the generated response?
The answer is yes, LLMs have different behavior and factual retrieval in different languages.
I had some papers about this open earlier today but closed them so now I can't link them ;(
I just tried your experiment, first asking for a bolognese sauce recipe in English, then translating the prompt to Italian and asking again. The recipes did contain some notable differences. Where the English version called for ground beef, the Italian version used a 2:1 mix of beef and pancetta; the Italian version further recommended twice as much wine, half as much crushed tomato, and no tomato paste. The cooking instructions were almost the same, save for twice as long a simmer in the Italian version.
More authentic, who knows? That's a tricky concept. I do think I'd like to try this robot-Italian recipe next time I make bolognese, though; the difference might be interesting.