logoalt Hacker News

tombertyesterday at 6:40 PM3 repliesview on HN

Read the last two paragraphs :)


Replies

svarayesterday at 9:32 PM

The things is, this is almost certainly what's happening.

You can (could, maybe they 'fixed' it by now) get sota LLMs to reproduce entire novels near verbatim.

The idea of giving it parallel texts of those novels in different languages, to train it on translation, is so obvious it'd just be strange if the AI labs didn't do it.

In fact DeepL was doing basically that more than 10 y ago.

Wowfunhappyyesterday at 7:23 PM

Oops, I legitimately missed the second-to-last paragraph.

I still think there are better tests you could do. Ideally, you would choose a book that was published recently—after the model’s cut-off date—which is considered to be a good translation. But even something like The Girl With the Dragon Tattoo, which is not particularly new and by no means obscure, would be better than a famous work of literature like The Three Musketeers that has many translations.

show 1 reply
card_zeroyesterday at 7:19 PM

They say "yes, I admit it, this is all invalid".

show 1 reply