Hacker News new | past | comments | ask | show | jobs | submit login

Until very recently, “dense retrieval” was not even as good as bm25, and still is not always better.

I think a lot of people use dense retrieval in applications where sparse retrieval is still adequate and much more flexible, because it has the hype behind it. Hybrid approaches also exist and can help balance the strengths and weaknesses of each.

Vectors can also work in other tasks, but largely people seem to be using them for retrieval only, rather than applying them to multiple tasks.




A lot of these things are use-case dependent. Like the characteristics even of BM-25 varies a lot depending on whether the query is over or under specified, the nature of the query and so on.

I don't think there will ever be an answer to what is the best way of doing information retrieval for a search engine scale corpus of document that is superior for every type of queries.


more commonly you use approximate KNN vector search with LLM based embeddings, which can find many fitting documents bm25 and similar would never manage to

the tricky part if to properly combine the results




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: