> nobody has tried to generalize it for example by combining the recurrence concept with next token prediction
Here you go: https://arxiv.org/abs/2502.05171