Side question: so you found the best way to generate PDFs was Chrome? I’ve recen...

jrockway · on April 21, 2019

I ended up using Puppeteer, wrapped with a node app that translates gRPC requests containing the various static files, returning the bytes of the rendered PDF. I did not dig fully into figuring out the best way to deal with the Chrome sandbox; I just gave my container the SYS_ADMIN capability which I am sure I will someday regret. Full details are available here: https://github.com/GoogleChrome/puppeteer/blob/master/docs/t...

I see no reason not to open-source it but I haven't bothered to do so. Send me an email (in profile) and I'll see to it happening. (It is easy to write. All the complexity is dealing with gRPC's rather awkward support for node, opentracing, prometheus, and all that good stuff. If you don't use gRPC, opentracing, and prometheus... you can just cut-n-paste the example from their website. My only advice is to wait for DOMContentLoaded to trigger rendering, rather than the default functionality of waiting 500ms for all network actions to complete. Using DOMContentLoaded, it's about 300ms end-to-end to render a PDF. With the default behavior, it's more than 1 second.)

Before Puppeteer I tried to make wkhtmltopdf work... but it has a lot of interesting quirks. For example, the version on my Ubuntu workstation understands modern CSS like flexbox, but the version that I was able to get into a Docker container didn't. (That would be the "patched qt" version in Docker, versus the "unpatched qt" version on Ubuntu. Obviously I could just run the Ubuntu version in a Docker container, but at that point I decided it was probably not going to be The Solution That Lasts For A While and we would eventually run into some HTML/CSS feature that wkhtmltopdf didn't support, so I decided to suck it up and just run Chrome.)

The main reason I didn't consider Puppeteer immediately is that Chrome on my workstation always uses like 100 billion terabytes of RAM. In production, we use t3.medium machines with 4G of RAM. I was worried that it was going to be very bloated and not fit on those machines. I was pleasantly surprised to see that it only uses around 200MB of RAM when undergoing a stress test.

philliphaydon · on April 21, 2019

I have a c# lambda in aws for taking screen grabs and PDFs of pages. If the service is running and hasn’t idled out it takes ~2 seconds. Takes about 8-15 on first run. Sometimes I’m willing to accept.

aidos · on April 21, 2019

There’s CEF, which is effectively Chrome as a library (it’s one of the targets for the build process E.g. Mac, windows, iPhone, CEF). There are various projects that then build on top of it, like CEF python.