BERT is truly amazing. Almost all inovation in NLP uses BERT and transformers somehow. ALBERT will be the next HUGE thing for the next months, as it show results better than BERT with a small fraction of parameters.
We did a "Semantic Similarity search" for some documents, where we represent a document as a vector using BERT, and had to look for documents close to a reference document.
The results where breathtaking. It really returned semantically similar documents. You can do it now using ElasticSearch(But you really should do it using Vespa.ai, it is much faster https://github.com/jobergum/dense-vector-ranking-performance )
The first project I ever put together involving (extremely trivial) ML used BERT, and something about seeing it just work opened my eyes to the ML world and got me excited to work in the space.
If anyone is interested in hacking around with BERT, I work on an open-source project called Cortex that handles model deployment, and we have full tutorial for deploying a sentiment classifier using BERT quickly and easily: https://github.com/cortexlabs/cortex/tree/master/examples/se...
That's very interesting! If you have the time for it, you should consider experimenting with swapping in SpanBERT[1] instead of BERT in your usecase. They train on full length length segments instead of masked half segments (as in BERT). I suspect that this, besides the improvements that SpanBERT brings over BERT should enable you to feed in bigger chunks (more sentences) to the model before the averaging step, leading to fewer vectors to average and as a result, perhaps better clustering.
I did not understood. BERT it is not a similarity measure. For our use case we did use a simple cosine similarity to find the similar documents.
But we have to represent those documents in a vector space. From our test representing the document as a MEAN of BERT Embeddings we got some very good results. Much better than BoW, Glove or the Lucene "More like this"
We did a "Semantic Similarity search" for some documents, where we represent a document as a vector using BERT, and had to look for documents close to a reference document.
The results where breathtaking. It really returned semantically similar documents. You can do it now using ElasticSearch(But you really should do it using Vespa.ai, it is much faster https://github.com/jobergum/dense-vector-ranking-performance )