logoalt Hacker News

wizzwizz4last Sunday at 12:31 AM1 replyview on HN

A GPT model would be modelled as an n-gram Markov model where n is the size of the context window. This is slightly useful for getting some crude bounds on the behaviour of GPT models in general, but is not a very efficient way to store a GPT model.


Replies

chpatricklast Sunday at 1:19 AM

I'm not saying it's an n-gram Markov model or that you should store them as a lookup table. Markov models are just a mathematical concept that don't say anything about storage, just that the state change probabilities are a pure function of the current state.

show 1 reply