Attention is calculated during the forward pass of the model, which happens in both inference (forward only) and training (forward & backward).
Dumb question: Can inference be done in a reverse pass? Outputs predicting inputs?
Dumb question: Can inference be done in a reverse pass? Outputs predicting inputs?