Distillation reduces accuracy and removes the contextual precision. For example ...

pheug · on Oct 26, 2019

You seem to be talking about dimensionality reduction, that's not what I was meant. Distillation is training a different model with a cheaper architecture (CNN, LSTM) on the outputs of an expensive teacher model like BERT. This has nothing to do with dimensions.

sdenton4 · on Oct 26, 2019

You might try vector quantization (instead of PCA) if you just need your 768 features to be smaller. ML features tend to be robust to some perturbation.

binarymax · on Oct 26, 2019

Well it’s one problem or another. If you compress too much you lose the value, and if you leave it too large you have the size problem.

Inverted indices are very efficient. How much of that can you give up at what trade off? If I’m only going to be better for 10% of queries, is that a cost effective solution? What if I spend the same amount of time tuning a traditional engine a bit more and get better accuracy for 5% of queries? Tradoffs rule the world of practical search implementations.