Yeah this simply wouldn't work. Models don't have any concept of "themselves". T...

qeternity 41 days ago | parent | context | favorite | on: DeepSeek-R1

Yeah this simply wouldn't work. Models don't have any concept of "themselves". These are just large matrices of floating points that we multiply together to predict a new token.

The context size would have to be in the training data which would not make sense to do.