logoalt Hacker News

netdevphoenixtoday at 2:00 PM4 repliesview on HN

It's just a token predictor what do you expect? What we need are tools that embrace that and ping the agent to validate what it just said or double check. But the trade off is that this might hamper their capabilities to some level


Replies

SlinkyOnStairstoday at 2:40 PM

> It's just a token predictor what do you expect?

The point isn't that it's unexpected. It's that prior text-to-speech systems were much better about this particular failure mode, prone to spitting out entirely incorrect words but not rephrasing entire sentences.

This is a particularly bad failure mode because people don't notice it.

> What we need are tools that embrace that and ping the agent to validate what it just said or double check.

This is not a problem that can be fixed by throwing more AI at it. It's a shared problem to all such systems, whether they're audio-text transformers or LLMs. Agentic review would just further push the system towards creating output that looks correct, but is not.

LLM translation does the same, yielding more natural text, but generally not better translation. In several cases, especially the "easy" translation between similar languages (e.g. within a language group like Germanic or Nordic) LLM-powered translation is notably worse than more primitive "word & phrase book" systems, tending to change the meaning of the text in order to have good grammar whereas these older systems would give crude or grammatically incorrect translations that still retained the core meaning.

show 2 replies
ffsm8today at 2:15 PM

While you're correct in what tthe audio models are - at least somewhat (they're not exactly like text based llms), you seem to brush his point away too quickly before fully exploring it.

This is a solvable issue, the current model and harnesses just aren't made with that assumption - hence they're doing "best effort while guessing if unsure".

Give it a few more months to years and things will likely settle how he pitched - at least in the context of note taking: only let it become "lore" if it didn't have to guess a word.

Currently there is basically only one mode - and it's optimized for conversation. The note taking is just glued on with that functionality as the backbone, and that's probably not going to stay.

show 1 reply
jghntoday at 2:49 PM

> what do you expect?

If the prediction strength is below X, put an indicator that it couldn't make a valid prediction?

freejazztoday at 3:34 PM

>It's just a token predictor what do you expect?

Someone tell Altman