logoalt Hacker News

empikoyesterday at 10:18 PM4 repliesview on HN

LLMs are indeed Markov chains. The breakthrough is that we are able to efficiently compute well performing probabilities for many states using ML.


Replies

famouswafflesyesterday at 10:32 PM

LLMs are not Markov Chains unless you contort the meaning of a Markov Model State so much you could even include the human brain.

show 3 replies
sreantoday at 9:29 AM

They are definitely not Markov Chains they may, however, be Markov Models. There's a difference between MC and MM.

show 1 reply
arbolestoday at 2:46 AM

Markov models with more than 3 words as "context window" produce very unoriginal text in my experience (corpus used had almost 200k sentences, almost 3 million words), matching the OP's experience. These are by no means large corpuses, but I know it isn't going away with a larger corpus.[1] The Markov chain will wander into "valleys" of reproducing paragraphs of its corpus one for one because it will stumble upon 4-word sequences that it has only seen once. This is because 4 words form a token, not a context window. Markov chains don't have what LLMs have.

If you use a syllable-level token in Markov models the model can't form real words much beyond the second syllable, and you have no way of making it make more sense other than increasing the token size, which exponentially decreases originality. This is the simplest way I can explain it, though I had to address why scaling doesn't work.

[1] There are 4^400000 possible 4-word sequences in English (barring grammar) meaning only a corpus with 8 times that amount of words and with no repetition could offer two ways to chain each possible 4 word sequence.

cwyersyesterday at 10:37 PM

Yeah, there's only two differences between using Markov chains to predict words and LLMs:

* LLMs don't use Markov chains, * LLMs don't predict words.

show 1 reply