1. poor recall for some fields, even with a wide variety of input document formats
2. needing to experiment with the json schema (particularly field descriptions) to get the best info out, and ignore superfluous information
3. for each long document, deciding whether to send the whole document in the context, or only the most relevant chunks (using traditional text search and semantic vector search)
4. poor quality OCR
From the demo video, it seems like your main innovation is allowing a non-technical user to do #2 in an iterative fashion. Have I understood correctly?
We face similar challenges you listed and handle all of the above.
1. Out of the box OCR doesn't perform as well for complex documents (with tables, images, etc.). We use vision model to help process that documents.
2. Recall (for longer documents) and accuracy are also a major problem. We built in validation systems and references to help users validate the results.
3. Maintain this systems in production, integrate with the data sources and refresh when new data comes in are quite annoying. We manage that for the end users.
4. For non-technical users, we allow them to iterate through different business logic and have a one unify place to manage data workflows.
I used OpenAI's function calling (via Langchain's https://python.langchain.com/v0.1/docs/modules/model_io/chat... API).
Some of the challenges I had:
1. poor recall for some fields, even with a wide variety of input document formats
2. needing to experiment with the json schema (particularly field descriptions) to get the best info out, and ignore superfluous information
3. for each long document, deciding whether to send the whole document in the context, or only the most relevant chunks (using traditional text search and semantic vector search)
4. poor quality OCR
From the demo video, it seems like your main innovation is allowing a non-technical user to do #2 in an iterative fashion. Have I understood correctly?