BERT is truly amazing. Almost all inovation in NLP uses BERT and transformers so...

calebkaiser · on Oct 25, 2019

The first project I ever put together involving (extremely trivial) ML used BERT, and something about seeing it just work opened my eyes to the ML world and got me excited to work in the space.

If anyone is interested in hacking around with BERT, I work on an open-source project called Cortex that handles model deployment, and we have full tutorial for deploying a sentiment classifier using BERT quickly and easily: https://github.com/cortexlabs/cortex/tree/master/examples/se...

woadwarrior01 · on Oct 25, 2019

That's very interesting! If you have the time for it, you should consider experimenting with swapping in SpanBERT[1] instead of BERT in your usecase. They train on full length length segments instead of masked half segments (as in BERT). I suspect that this, besides the improvements that SpanBERT brings over BERT should enable you to feed in bigger chunks (more sentences) to the model before the averaging step, leading to fewer vectors to average and as a result, perhaps better clustering.

[1]: https://arxiv.org/abs/1907.10529

bratao · on Oct 25, 2019

Thank you, I will read and try it. Looks very interesting!

binarymax · on Oct 25, 2019

I agree that it works very well for "more like this" document recommendations! But not great for user queries.

buboard · on Oct 25, 2019

but the question is how better they are compared to existing similarity measures. E.g. for documents in a domain, even simple cosine is pretty good.

bratao · on Oct 25, 2019

I did not understood. BERT it is not a similarity measure. For our use case we did use a simple cosine similarity to find the similar documents. But we have to represent those documents in a vector space. From our test representing the document as a MEAN of BERT Embeddings we got some very good results. Much better than BoW, Glove or the Lucene "More like this"

bitL · on Oct 25, 2019

Dude, you are probably releasing trade secrets ;-)

binarymax · on Oct 26, 2019

Nah, lots of people are trying stuff like this :)

buboard · on Oct 25, 2019

yes sorry. i meant a more naive, count-based vector representation instead of BERT embedding