logoalt Hacker News

SlinkyOnStairstoday at 2:40 PM2 repliesview on HN

> It's just a token predictor what do you expect?

The point isn't that it's unexpected. It's that prior text-to-speech systems were much better about this particular failure mode, prone to spitting out entirely incorrect words but not rephrasing entire sentences.

This is a particularly bad failure mode because people don't notice it.

> What we need are tools that embrace that and ping the agent to validate what it just said or double check.

This is not a problem that can be fixed by throwing more AI at it. It's a shared problem to all such systems, whether they're audio-text transformers or LLMs. Agentic review would just further push the system towards creating output that looks correct, but is not.

LLM translation does the same, yielding more natural text, but generally not better translation. In several cases, especially the "easy" translation between similar languages (e.g. within a language group like Germanic or Nordic) LLM-powered translation is notably worse than more primitive "word & phrase book" systems, tending to change the meaning of the text in order to have good grammar whereas these older systems would give crude or grammatically incorrect translations that still retained the core meaning.


Replies

jacobr1today at 4:48 PM

Older ML systems were much better at exposing their internal confidence. Plenty of papers reverse out this kind of interpretability for open weight models. All the models exposed logprobs early on. This seems solvable if prioritized. The unintelligible words should be lower confidence. Getting per-token data for the output that aids with understanding the predictions is entirely feasible as engineering effort - it just won't be enough to address all the problems - but it should help quite a bit.

Semaphortoday at 3:54 PM

I often (ish) translate between English and German, two languages I speak very well. The quality of translation is amazing and far better than what old systems did.

Maybe it depends on topics or length, for me it's usually 1-2 paragraphs of a German article to share online.