Hacker News new | past | comments | ask | show | jobs | submit login

Both fulltext (BM25 or SPLADE) and dense vector search have issues with documents of different lengths. Part of what makes recursive sentence splitting work so well are its length normalization properties.

Filters are a really important feature downstream of that which this system can provide.

We have also worked with the Enron corpus for demos and fast, reliable ETL for a set of documents that large is more difficult than it seems and a commendable problem to solve.

Exciting stuff!




Thanks! We also start to see the patterns where search systems are being improved with filters and hierarchy level metadata. Another use case that people use Trellis for is ingesting data into their downstream LLMs applications.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: