If you just need full text search, assuming you're already using Postgres you ca...

burntsushi · 2024-05-28T12:30:32

AFAIK, PostgreSQL doesn't provide a way to get the IDF of a term, which makes its ranking function pretty limited. tf-idf (and its varians, like Okapi BM25) are kinda table stakes for an information retrieval system IMO.

I'm not saying PostgreSQL's functionality is useless, but if you need ranking based on the relative frequency of a term in a corpus, then I don't believe PostgreSQL can handle that unless something has changed in the last few years. Usually the reason to use something like Lucene or Tantivy is precisely for its ranking support that incorporates inverse document frequency.

philippemnoel · 2024-05-28T21:03:45

Postgres's FTS is actually quite solid! You can get very far with just the built-in tsvector. The ranking could be improved, though, which was one of the reasons for creating pg_search in the first place: https://github.com/paradedb/paradedb/tree/dev/pg_search (disclaimer: I work on pg_search @ ParadeDB)

burntsushi · 2024-05-29T18:19:20

Okay, but I didn't say it wasn't solid. I just said its ranking wasn't great because it lacks IDFs. It seems like we must be in violent agreement, given that you work on something that must be adding IDFs to PostgreSQL FTS. :P