So, there is this tool: <https://pypi.org/project/ocrmypdf>
OCRmyPDF adds an OCR text layer to scanned PDF files, allowing them to be searched or copy pasted.
This mentions cross-platform usage and seems simple. Is there some way to use this with Obsidian through a plugin? If so, please create a GitHub project and send us a link! :D
I have messed around with Tesseract a bit, and at some point, I was able to turn a PDF into a text file. I was thinking of doing that and having them as markdown files which are tiny compared to the original. Then keep the Original elsewhere. I've been playing with another PDF system I'm not willing to give up on. :D
Point is, I think I can bring more dimension to my Obsidian or Markdown library by having 'txt' copies of pdfs. The search and tagging alone would be so handy.
This mentions cross-platform usage and seems simple. Is there some way to use this with Obsidian through a plugin? If so, please create a GitHub project and send us a link! :D
Just for reference, I found it with a list of a ton of other tools: https://unix.stackexchange.com/questions/301318/how-to-ocr-a...
I have messed around with Tesseract a bit, and at some point, I was able to turn a PDF into a text file. I was thinking of doing that and having them as markdown files which are tiny compared to the original. Then keep the Original elsewhere. I've been playing with another PDF system I'm not willing to give up on. :D
Point is, I think I can bring more dimension to my Obsidian or Markdown library by having 'txt' copies of pdfs. The search and tagging alone would be so handy.