Hacker News new | past | comments | ask | show | jobs | submit login

SumatraPDF is among the OSS PDF tools I use.

Since Adobe is pushing a more aggressive stance for monetization of Acrobat, I am trying to replace selected PDF workflows with OSS. Here are some of the tools I use.

    qpdf
        removing passwords, unlocking PDFs, conversion
        install in WSL with apt-get install qpdf
        remove password with qpdf --decrypt --password="" input.pdf output.pdf
    PDF4QT - Open Source PDF Editing
        Deleting, Sorting, Extracting Pages
        Currently, no choco release available, must be installed manually from PDF4QT/releases
    Inkscape, LibreOffice Draw
        editing PDFs, adding text
    Mupdf
        Command line tool and Python package for parsing, filling forms, adding text
    SumatraPDF
        Viewing of PDFs
    pdfplumber
        Awesome python package to extract tables from PDFs into data pipelines. Use with Jupyter Lab



FYI, you can use firefox for viewing,signing, and adding text to PDFs. You can also use it to remove password (just do print to PDF after unlocking it).


> you can use firefox for viewing,signing

I got all excited - then realised "signing" just means inserting a picture. Notably absent are open source tools for digitally signing and verifying PDF's. Apparently pdftk does it in a paid version.

It's funny in a way - in this thread we have people wanting ways to modify a PDF. Yet to me, being any to prove it's not modified (eg, it's statement provably issued by some bank saying they transferred funds to my bank on behalf of person XYZ) is far more important. Instead we have companies offering paid "document signing services" which are built on sand - you can easily forge / modify any signed document they issue.


Okular. Okular offers digital pdf signing


At least as of Firefox 109, support for non Latin languages was broken to the point of being completely unusable.


128 characters ought to be enough for everybody.



PDFTK and pdfjam are two other useful command line tools. I use PDFTK for merging PDFs, extracting/deleting/duplicating pages, and decompressing so I can extract and manipulate text/data in raw PDF commands. I use pdfjam for n-up and adjusting page size and margins.


PDFTK can choke on some merges, some newish pdf features. In those cases one can use Ghostscript to merge and other manipulation.


You mention qpdf not available in Chocolatey, but it's available in Scoop, which is another Windows package manager: https://github.com/ScoopInstaller/Main/blob/master/bucket/qp...


For other distributions / OS / package managers see here: https://repology.org/project/qpdf/versions


seem to be available on winget too

   > winget search qpdf
   Name Id        Version Source
   ------------------------------
   QPDF QPDF.QPDF 11.6.3  winget


For extracting to tables I've been using http://tabula.technology/ for a couple of years. It seems to do a pretty good job even with some fairly complex tables and I've not had any problems with it.


Yes, tabula is the other table extraction tool. I used both and prefer pdfplumber because it is really robust and works well with Jupyter Lab.


you can also use okular for visually select a table from a pdf and paste it in sone excel kinda software and okulars table select tool is not too bad


Actually SumatraPDF is using MuPDF now. But there is some limitation on rendering PDF and eBook files. For example, formatting PDF file or displaying Unicode characters in epub file.


Do you mind reporting those issues either to SumatraPDF at https://github.com/sumatrapdfreader/sumatrapdf/issues or directly to MuPDF at https://bugs.ghostscript.com/ if it also has the same issue? Thank you!

There are many wonderfully weird PDFs and epubs out there, but we do our best to fix issues. :)



I like k2pdfopt for reformatting pdfs for my e-reader.

I've also used poppler's pdfimages but I'd prefer like something less buggy for my use case; any version I've tried had problem with one pdf made by Adobe InDesign.

Also, tesseract allows creation of a pdf from the images with the embedded OCR text. It is also built in in the k2pdfopt.


Also take a look at Okular. It's the only PDF reader I've seen to let you select a specific column in a table.


Okular is my go-to document reader across operating systems. In addition to PDF, it can open EPub, DjVU, JPEG, PNG, GIF, Tiff, WebP, CBR, CBZ, DVI, XPS, ODT and other formats.


May I recommend NAPS2 ? https://www.naps2.com/

It's like PDF4QT but works and feel better to me.


You can also use Okular to open and edit PDFs, it's the document viewer from KDE.


Adobe Acrobat reader installer is also almost a 1 gb download these days. One thing I do find that Acrobat does better is compression. I can usually reduce a PDF down to about 30%-40% of its original size without much loss in quality. I've tried other tools and they haven't worked nearly as well.


META: What's the best way to convert pdfs to CSV/excel? Any new LLM tool?


I think the best way to convert pdfs to tabular data is using pdfplumber together with a pandas (dataframes) workflow and writing results to CSV.


abbyy finereader 8.0(!!!)

good luck finding it


It's funny what you are downvoted, but FR8 was way better OCRing Office-printer-scanned documents even against the much later versions of FR, I saw the comparison on the same source documents.

> good luck finding it

It's still available on the trackers.


This is a really obscure recommendation. Can anyone who knows about the specifics expound?


Tabula


To edit PDFs in Figma, it exists pdf.to.design now. https://pdf.to.design


I use xournal to add signatures (just pngs) and text to pdfs.


Pdf-tools in Emacs is also great.


And it has a killer feature:

pdf-view-themed-minor-mode

It matches the pdf style/colors to your emacs theme! Sort of like a dark reader for pdfs, but it automatically adjusts to any theme based on some good but likely imperfect heuristics.


how can i sign and date a PDF with a bitmap signature without 9999 janky clicks




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: