Yeah if layers() is a shortcut for layer4(layer3(layer2(layer1(input)))). But sometimes it’s only
output = layers(input)
Or
output = layers(layers(input))
Depends on how difficult the token is.
Or more like,
x = tokenize(input) i = 0 do { finish, x = layers(x) } while(!finish && i++ < t_max); output = lm_head(x)
Or more like,