Both fulltext (BM25 or SPLADE) and dense vector search have issues with document... | Hacker News

Hacker News new | past | comments | ask | show | jobs | submit

login

skeptrune 6 months ago | parent | context | favorite | on: Launch HN: Trellis (YC W24) – AI-powered workflows...

Both fulltext (BM25 or SPLADE) and dense vector search have issues with documents of different lengths. Part of what makes recursive sentence splitting work so well are its length normalization properties.

Filters are a really important feature downstream of that which this system can provide.

We have also worked with the Enron corpus for demos and fast, reliable ETL for a set of documents that large is more difficult than it seems and a commendable problem to solve.

Exciting stuff!

macklinkachorn 6 months ago [–]

Thanks! We also start to see the patterns where search systems are being improved with filters and hierarchy level metadata. Another use case that people use Trellis for is ingesting data into their downstream LLMs applications.

Join us for AI Startup School this June 16-17 in San Francisco!
Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact