logoalt Hacker News

anon373839today at 12:21 AM1 replyview on HN

The model outputs a probability distribution for the next token, given the sequence of all previous tokens in the context window. It’s just a list of floats in the same order as the list of tokens that the tokenizer uses.

After that, a piece of software that is NOT the LLM chooses the next token. This is called the sampler. There are different sampling parameters and strategies available, but if you want repeatable* outputs, just take the token with the highest probability number.

* Perfect determinism in this sense is difficult to achieve because GPU calculations naturally have a minor bit of nondeterminism. But you can get very close.


Replies

2ndorderthoughttoday at 12:27 AM

I'm not so sold the LLM is an LLM without a sampler but it's not worth quibbling over. It's part of the statistical model anyways.