I've been working on using BERT for search, for research and training developmen...

pheug · on Oct 25, 2019

Distillation is usually used today to tame its resource problems at scale - you run BERT to squeeze out maximum signal from your training data and then distill the model e.g. into cheap CNN for inference.

binarymax · on Oct 25, 2019

Distillation reduces accuracy and removes the contextual precision. For example reducing a whole document to some N (1k or so) dimensions have worked very poorly in my experiments for short queries - typically making the relevance worse than basic keyword search.

pheug · on Oct 26, 2019

You seem to be talking about dimensionality reduction, that's not what I was meant. Distillation is training a different model with a cheaper architecture (CNN, LSTM) on the outputs of an expensive teacher model like BERT. This has nothing to do with dimensions.

sdenton4 · on Oct 26, 2019

You might try vector quantization (instead of PCA) if you just need your 768 features to be smaller. ML features tend to be robust to some perturbation.

binarymax · on Oct 26, 2019

Well it’s one problem or another. If you compress too much you lose the value, and if you leave it too large you have the size problem.

Inverted indices are very efficient. How much of that can you give up at what trade off? If I’m only going to be better for 10% of queries, is that a cost effective solution? What if I spend the same amount of time tuning a traditional engine a bit more and get better accuracy for 5% of queries? Tradoffs rule the world of practical search implementations.

ma2rten · on Oct 26, 2019

Just an idea: Maybe you could either train a model or use heuristics to translate from keywordese to English?

binarymax · on Oct 26, 2019

Haha I wish! Too much fidelity has been lost already. The model would just be guessing.

The sniff test is if a person can’t do it, then a model can’t either. Lots of queries look fine for matching, but you really have no idea what the intent or information need of the searcher is.

ma2rten · on Oct 26, 2019

No, I mean before you feed it into the model.

binarymax · on Oct 26, 2019

I’m not sure what you mean. Keywords are keywords. The meaning behind what the user wants is in their head. You cant turn keywords into a sentence without guessing what they meant.