Hacker News new | past | comments | ask | show | jobs | submit login

somewhat OT, but fwiw I recently stumbled upon the Fonduer project which does some interesting extraction methods beyond just OCR. https://hazyresearch.github.io/snorkel/blog/fonduer.html

They have a pdf-to-tree package which i haven't had good results from but perhaps i need to finally learn ML and try to train models for this a bit: https://github.com/HazyResearch/pdftotree




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: