Do you have any evidence at all of this? I know how LLMs are trained and this makes no sense to me. ...

jaccola • yesterday at 11:04 AM • 4 replies • view on HN

Do you have any evidence at all of this? I know how LLMs are trained and this makes no sense to me. Otherwise you'd just put filler words in every input

e.g. instead of: "The square root of 256 is" you'd enter "errr The er square um root errr of 256 errr is" and it would miraculously get better? The model can't differentiate between words you entered and words it generated its self...

Replies

muzani • yesterday at 11:40 AM

It's why it starts with "You're absolutely right!" It's not to flatter the user. It's a cheap way to guide the response in a space where it's utilizing the correction.

mike_hearn • yesterday at 3:39 PM

People have researched pause tokens for this exact reason.

staminade • yesterday at 11:35 AM

What do you think chain of thought reasoning is doing exactly?

lijok • yesterday at 11:30 AM

You’re conflating training and inference

alt Hacker News

Replies