LLMs are not "average text generation machines" once they have context. LLMs learn a distr...

rytill • today at 1:05 AM • 1 reply • view on HN

LLMs are not "average text generation machines" once they have context. LLMs learn a distribution.

The moment you start the prompt with "You are an interactive CLI tool that helps users with software engineering at the level of a veteran expert" you have biased the LLM such that the tokens it produces are from a very non-average part of the distribution it's modeling.

Replies

jason_oster • today at 4:56 AM

True, but nuanced. The model does not react to "you are an experienced programmer" kinds of prompts. It reacts to being given relevant information that needs to be reflected in the output.

See examples in https://arxiv.org/abs/2305.14688; They certainly do say things like "You are a physicist specialized in atomic structure ...", but the important point is that the rest of the "expert persona" prompt _calls attention to key details_ that improves the response. The hint about electromagnetic forces in the expert persona prompt is what tipped off the model to mention it in the output.

Bringing attention to key details is what makes this work. A great tip for anyone who wants to micromanage code with an LLM is to include precise details about what they wish to micromanage: say "store it in a hash map keyed by unsigned integers" instead of letting the model decide which data structure to use.

alt Hacker News

Replies