Until very recently, “dense retrieval” was not even as good as bm25, and still i...

marginalia_nu · on Oct 4, 2023

A lot of these things are use-case dependent. Like the characteristics even of BM-25 varies a lot depending on whether the query is over or under specified, the nature of the query and so on.

I don't think there will ever be an answer to what is the best way of doing information retrieval for a search engine scale corpus of document that is superior for every type of queries.

dathinab · on Oct 4, 2023

more commonly you use approximate KNN vector search with LLM based embeddings, which can find many fitting documents bm25 and similar would never manage to

the tricky part if to properly combine the results