this is neat but to me seems like the circuitous path to just skipping autoregression, whereas the direct path is to just not do autoregression. get your answers from the one forward pass, and instead of backprop just do lookups and updates as the same operation.