logoalt Hacker News

AlexCoventrytoday at 8:25 AM0 repliesview on HN

The warmup process is necessary in order to try to find high-probability regions of the target distribution. That's not an issue for an LLM, since it's trained to sample directly from a distribution which looks like natural language.

There is some work on using MCMC to sample from higher-probability regions of an LLM distribution [1], but that's a separate thing. Nobody doubts that an LLM is sampling from its target distribution from the first token it outputs.

[1] https://arxiv.org/abs/2510.14901