logoalt Hacker News

jgammelltoday at 1:35 AM0 repliesview on HN

When sampling from an LLM people normally truncate the token probability distribution so that low-probability tokens are never sampled. So the model shouldn't produce really weird outputs even if they technically have nonzero probability in the pre/post training data.