logoalt Hacker News

make3yesterday at 1:08 PM0 repliesview on HN

Transformers are not Markovian, their whole point is arguably to be the reverse of Markovian, to efficiently make it so the new tokens are a function of all previous tokens