logoalt Hacker News

bob1029today at 9:02 AM0 repliesview on HN

Next token prediction is about predicting the future by minimizing the number of bits required to encode the past. It is fundamentally causal and has a discrete time domain. You can't predict token N+2 without having first predicted token N+1. The human brain has the same operational principles.