What would this look like?
the model generates probabilities for the next token, then you set the probability of not allowed tokens to 0 before sampling (deterministically or probabilistically)
the model generates probabilities for the next token, then you set the probability of not allowed tokens to 0 before sampling (deterministically or probabilistically)