> An n-gram model’s defining constraint is: there exists a small constant k such that the next-token distribution depends only on the last k tokens, full stop.
I don't necessarily agree with GP, but I also don't think that a markov chain and markov generator definitions include the word "small".
That constant can be as large as you need it to be.