Yeah this simply wouldn't work. Models don't have any concept of "themselves". These are just large matrices of floating points that we multiply together to predict a new token.
The context size would have to be in the training data which would not make sense to do.
The context size would have to be in the training data which would not make sense to do.