Unstructured seems to be focusing a lot on the document chunking and data ingestion into RAGs part. Trellis handles the process end-to-end from extraction to transforming the data into the schema that you need for downstream applications.
The way unstructured built their parsing and extraction are mostly based on traditional OCR and rule based extraction. We built all preprocessing pipeline in an LLM and vision model first way that allows us to be flexible when the data is quite complex (like tables and images within documents).
The way unstructured built their parsing and extraction are mostly based on traditional OCR and rule based extraction. We built all preprocessing pipeline in an LLM and vision model first way that allows us to be flexible when the data is quite complex (like tables and images within documents).