I have been aiming for default settings which would render the site as you see it (screenshot style), but so that you can switch the settings to honor @media print CSS rules.
One of the biggest values I think this yet-another-PDF-service has is that if you open the print preview on a desktop Chrome, it should be really close to what the API renders. Should make debugging a bit easier.
The main use case is to render content generated by yourself, e.g. receipts and invoices, but I don't see a reason why it couldn't be used for rendering news or blog articles.
If PDF is not really important for you and HTML is fine, take a look at SingleFile [1]. This is a Chrome extension I wrote some years ago to save a page and all its resources in a HTML file.
Scaling is a challenge definitely. Rendering image-heavy sites requires quite a lot of RAM. But as others have stated, the good news are you can quite easily scale this horizontally by adding more servers. There's no shared state between the server instances behind a load balancer.
There's also room for improvement in how efficiently a single server instance can render PDFs. The API doesn't yet support resource pooling, this would make reusing the same Chrome process (with e.g. 4 tabs) possible. The implementation requires careful consideration since in that model it's possible to accidentally share content from previous requests to the new requesters.
That's a good idea! You can achieve this by adding e.g. &pdf.width=1000px&pdf.height=10000px parameters.
Sometimes you can get rid of the sticky headers with &emulateScreenMedia=false parameter if the page has well implemented @media print rules in CSS. We decided to use page.emulateMedia('screen') with Puppeteer to make PDFs look more like the actual web page by default.
Pages which use lazy loading for images may look incorrect when rendered. &scrollPage=true parameter may help with this. It scrolls the page to the bottom before rendering the PDF.
Thanks for the comment! I haven't personally used wkhtmltopdf much, but I like having Chrome as the rendering engine. In theory at least, debugging the PDFs can be done with desktop Chrome's print preview. I don't know about wkhtmltopdf, but url-to-pdf-api supports dynamic single-page apps, which can be beneficial depending on the use case.
Headless Chrome is quite new so it still has some bugs, but I have a hunch that it will in the end have most reliable and expected render results.
I am not at a computer noe, so I can’t test it, but do you take redirects into account? I hope you are not just whitelisting the initial URL, but also any URL’s it redirects to. If you don’t already, you should probably just disable redirects in whatever library you use.
I gave this a thought for a moment. Since we're using a real browser, there are huge amount of different ways to get the browser display a file:// link. Redirect is one, window.location.href is another, etc. The service shouldn't be run publicly in the internet for real use cases. If you do, the server should be designed in a way that it's not dangerous if the web server user gets read access to file system. I added a warning about this in the top of the README.
Not sure if this is sound advice. Blacklisting is a cat and mouse game, especially for security. The risk of a missing entry on a blacklist is worse than on a whitelist.
I forwarded this to the main contributors, thanks for the good feedback. Pepperoni is more of a starter kit to kickstart your React Native development as fast as possible. You get the redux architecture and other solid pieces but there's no internal framework API. I hope that explained the core idea.
That idea is explained well, but I agree with OP in that a working example would do wonders.
Is this a boilerplate repo that I should clone and build off of? Is this a project generator like rake? Is this a library I should npm install to depend on?
I agree with this. The page says it's a framework, but I feel like it's describing a kickstarter/generator. That said, I will bookmark Pepperoni and give it a go during the weekend, it looks interesting.
One of the biggest values I think this yet-another-PDF-service has is that if you open the print preview on a desktop Chrome, it should be really close to what the API renders. Should make debugging a bit easier.
The main use case is to render content generated by yourself, e.g. receipts and invoices, but I don't see a reason why it couldn't be used for rendering news or blog articles.