That’s closer still. But even closer would be: x = tokenize(input) i = 0 fi...

oofbey • last Monday at 12:43 AM • 0 replies • view on HN

That’s closer still. But even closer would be:

    x = tokenize(input)
    i = 0
    finish = 0
    do {
      p, x = layers(x)
      finish += p
    } while(finish < 0.95 && i++ < t_max);
    output = lm_head(x)

Except the accumulation of the stop probabilities isn’t linear like that - it’s more like a weighted coin model.

alt Hacker News