It's a modeling of language, it's not structurally anything like an LLM.
It’s literally a trigram (character) language model. Check any NLP book from before 2015 or so.
LLMs have more stuff bolted onto them (embeddings, RLHF) but the autoregressive core is a direct descendent of that sort of language model.
It’s literally a trigram (character) language model. Check any NLP book from before 2015 or so.
LLMs have more stuff bolted onto them (embeddings, RLHF) but the autoregressive core is a direct descendent of that sort of language model.