We are in the process of supporting a native `LOAD PDF` command. Meanwhile, you could convert each PDF into a series of images and load them using the `LOAD IMAGE` command. You could then run any text extraction user-defined function (e.g., `textract` [1]) over the loaded documents with additional filters based on your constraints (like PDF author or creation date). As EVA is designed for local usage, you can run it on local private files. We would love to jointly explore how to best support your text extraction pipeline. Please consider opening an issue with more details on your use case.
[1] https://textract.readthedocs.io/en/stable/python_package.htm...