That’s closer still. But even closer would be:
x = tokenize(input)
i = 0
finish = 0
do {
p, x = layers(x)
finish += p
} while(finish < 0.95 && i++ < t_max);
output = lm_head(x)
Except the accumulation of the stop probabilities isn’t linear like that - it’s more like a weighted coin model.