Hacker News new | past | comments | ask | show | jobs | submit login

You seem to be talking about dimensionality reduction, that's not what I was meant. Distillation is training a different model with a cheaper architecture (CNN, LSTM) on the outputs of an expensive teacher model like BERT. This has nothing to do with dimensions.



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: