Yes, technically you can frame an LLM as a Markov chain by defining the "state" as the ent...

famouswaffles • last Sunday at 12:42 AM • 1 reply • view on HN

Yes, technically you can frame an LLM as a Markov chain by defining the "state" as the entire sequence of previous tokens. But this is a vacuous observation under that definition, literally any deterministic or stochastic process becomes a Markov chain if you make the state space flexible enough. A chess game is a "Markov chain" if the state includes the full board position and move history. The weather is a "Markov chain" if the state includes all relevant atmospheric variables.

The problem is that this definition strips away what makes Markov models useful and interesting as a modeling framework. A “Markov text model” is a low-order Markov model (e.g., n-grams) with a fixed, tractable state and transitions based only on the last k tokens. LLMs aren’t that: they model using un-fixed long-range context (up to the window). For Markov chains, k is non-negotiable. It's a constant, not a variable. Once you make it a variable, near any process can be described as markovian, and the word is useless.

Replies

chpatrick • last Sunday at 1:18 AM

Sure many things can be modelled as Markov chains, which is why they're useful. But it's a mathematical model so there's no bound on how big the state is allowed to be. The only requirement is that all you need is the current state to determine the probabilities of the next state, which is exactly how LLMs work. They don't remember anything beyond the last thing they generated. They just have big context windows.

➕ show 2 replies

alt Hacker News

Replies