logoalt Hacker News

omneitytoday at 4:07 PM1 replyview on HN

Attention is calculated during the forward pass of the model, which happens in both inference (forward only) and training (forward & backward).


Replies

SubiculumCodetoday at 4:41 PM

Dumb question: Can inference be done in a reverse pass? Outputs predicting inputs?

show 3 replies