Hacker News
new
|
past
|
comments
|
ask
|
show
|
jobs
|
submit
login
rahimnathwani
on July 25, 2022
|
parent
|
context
|
favorite
| on:
Search PDFs with Transformers and Python Notebook
Under the hood, it uses
https://github.com/pdfminer/pdfminer.six
which expects the text to be stored as text.
alexcg1
on July 25, 2022
[–]
You mean the PDFSegmenter Executor in the notebook?
rahimnathwani
on July 25, 2022
|
parent
[–]
Yes
alexcg1
on July 25, 2022
|
root
|
parent
[–]
PDFSegmenter also extracts images, which can then be OCR'ed in the next step of the pipeline
Consider applying for YC's Spring batch! Applications are open till Feb 11.
Guidelines
|
FAQ
|
Lists
|
API
|
Security
|
Legal
|
Apply to YC
|
Contact
Search: