Yeah this simply wouldn't work. Models don't have any concept of "themselves". These are just large matrices of floating points that we multiply together to predict a new token.
The context size would have to be in the training data which would not make sense to do.