if the probability mass is on a single token, its a precise answer like `1 + 1 = ` if next token predicted shares probability with other token, then there are multiple answers like `position: `
you can generate and train answers by exploring on varying the length of the code generated