logoalt Hacker News

qeternity01/21/20250 repliesview on HN

Yeah this simply wouldn't work. Models don't have any concept of "themselves". These are just large matrices of floating points that we multiply together to predict a new token.

The context size would have to be in the training data which would not make sense to do.