Both fulltext (BM25 or SPLADE) and dense vector search have issues with documents of different lengths. Part of what makes recursive sentence splitting work so well are its length normalization properties.
Filters are a really important feature downstream of that which this system can provide.
We have also worked with the Enron corpus for demos and fast, reliable ETL for a set of documents that large is more difficult than it seems and a commendable problem to solve.
Thanks! We also start to see the patterns where search systems are being improved with filters and hierarchy level metadata. Another use case that people use Trellis for is ingesting data into their downstream LLMs applications.
Filters are a really important feature downstream of that which this system can provide.
We have also worked with the Enron corpus for demos and fast, reliable ETL for a set of documents that large is more difficult than it seems and a commendable problem to solve.
Exciting stuff!