Does anyone know if tokenizers are pruned? That is, if a token doesn't appear in...

		meltyness 6 days ago \| parent \| context \| favorite \| on: Are LLMs able to notice the “gorilla in the data”? Does anyone know if tokenizers are pruned? That is, if a token doesn't appear in the corpus is it removed from the model? That would imply a process that leaks information about the dataset.