logoalt Hacker News

motoboitoday at 2:42 PM1 replyview on HN

Reflect a moment over the fact that LLMs currently are just text generators.

Also that the conversational behavior we see it’s just examples of conversations that we have the model to mimic so when we say “System: you are a helpful assistant. User: let’s talk. Assistant:” it will complete the text in a way that mimics a conversation?.

Yeah, we improved over that using reinforcement learning to steer the text generation into paths that lead to problem solving and more “agentic” traces (“I need to open this file the user talked about to read it and then I should run bash grep over it to find the function the user cited”), but that’s just a clever way we found to let the model itself discover which text generation paths we like the most (or are more useful to us).

So to comment on your discomfort, we (humans) trained the model to spill out answers (there are thousand of human being right now writing nicely though and formatted answers to common questions so that we can train the models on that).

If we try to train the models to mimic long dances into shared meaning we will probably decrease their utility. And we won’t be able anyway to do that because then we would have to have customized text traces for each individual instead of question-answers pairs.

Downvoters: I simplified things a lot here, in name of understanding, so bear with me.


Replies

MangoToupetoday at 2:50 PM

> Reflect a moment over the fact that LLMs currently are just text generators.

You could say the same thing about humans.

show 4 replies