Hacker News new | past | comments | ask | show | jobs | submit | kimmobru's comments login

I have been aiming for default settings which would render the site as you see it (screenshot style), but so that you can switch the settings to honor @media print CSS rules.

One of the biggest values I think this yet-another-PDF-service has is that if you open the print preview on a desktop Chrome, it should be really close to what the API renders. Should make debugging a bit easier.

The main use case is to render content generated by yourself, e.g. receipts and invoices, but I don't see a reason why it couldn't be used for rendering news or blog articles.


If PDF is not really important for you and HTML is fine, take a look at SingleFile [1]. This is a Chrome extension I wrote some years ago to save a page and all its resources in a HTML file.

[1] https://chrome.google.com/webstore/detail/singlefile/mpiodij...


Scaling is a challenge definitely. Rendering image-heavy sites requires quite a lot of RAM. But as others have stated, the good news are you can quite easily scale this horizontally by adding more servers. There's no shared state between the server instances behind a load balancer.

There's also room for improvement in how efficiently a single server instance can render PDFs. The API doesn't yet support resource pooling, this would make reusing the same Chrome process (with e.g. 4 tabs) possible. The implementation requires careful consideration since in that model it's possible to accidentally share content from previous requests to the new requesters.


> Rendering image-heavy sites requires quite a lot of RAM.

Is that the limiting factor? How many would you optimally do in parallel if RAM wasn't an issue?


That's a good idea! You can achieve this by adding e.g. &pdf.width=1000px&pdf.height=10000px parameters.

Sometimes you can get rid of the sticky headers with &emulateScreenMedia=false parameter if the page has well implemented @media print rules in CSS. We decided to use page.emulateMedia('screen') with Puppeteer to make PDFs look more like the actual web page by default.

Pages which use lazy loading for images may look incorrect when rendered. &scrollPage=true parameter may help with this. It scrolls the page to the bottom before rendering the PDF.

Using these options make the PDF better: https://url-to-pdf-api.herokuapp.com/api/render?url=https://...


I bet that breaks on infinite scroll pages ?


Thanks for the comment! I haven't personally used wkhtmltopdf much, but I like having Chrome as the rendering engine. In theory at least, debugging the PDFs can be done with desktop Chrome's print preview. I don't know about wkhtmltopdf, but url-to-pdf-api supports dynamic single-page apps, which can be beneficial depending on the use case.

Headless Chrome is quite new so it still has some bugs, but I have a hunch that it will in the end have most reliable and expected render results.


Oh wow, that's a bit embarrassing :) The urls are now restricted to http and https only. Thanks for noticing!


I am not at a computer noe, so I can’t test it, but do you take redirects into account? I hope you are not just whitelisting the initial URL, but also any URL’s it redirects to. If you don’t already, you should probably just disable redirects in whatever library you use.


I gave this a thought for a moment. Since we're using a real browser, there are huge amount of different ways to get the browser display a file:// link. Redirect is one, window.location.href is another, etc. The service shouldn't be run publicly in the internet for real use cases. If you do, the server should be designed in a way that it's not dangerous if the web server user gets read access to file system. I added a warning about this in the top of the README.


You are using Chrome headless therefore you can use group policy to add "file://" to the URL blacklist; see http://www.chromium.org/administrators/policy-list-3#URLBlac...


By default: if Headless Chrome hits a redirect to a file:// it returns: net::ERR_UNSAFE_REDIRECT

window.location.href = 'file:///' will return console error: "Not allowed to load local resource"


This is ripe for an easter egg if you request a file:// URL PDFs support compression, right? I wonder how hardened they are to zip bombs...


Consider adding FTP.

It might be better to blacklist file:// rather than trying to have a comprehensive whitelist.


Not sure if this is sound advice. Blacklisting is a cat and mouse game, especially for security. The risk of a missing entry on a blacklist is worse than on a whitelist.


I forwarded this to the main contributors, thanks for the good feedback. Pepperoni is more of a starter kit to kickstart your React Native development as fast as possible. You get the redux architecture and other solid pieces but there's no internal framework API. I hope that explained the core idea.


That idea is explained well, but I agree with OP in that a working example would do wonders.

Is this a boilerplate repo that I should clone and build off of? Is this a project generator like rake? Is this a library I should npm install to depend on?

How do I actually use it?


I agree with this. The page says it's a framework, but I feel like it's describing a kickstarter/generator. That said, I will bookmark Pepperoni and give it a go during the weekend, it looks interesting.


Thanks for your interest! It's definitely more of a project boilerplate. We have big ideas for the future, but starting out small.

Thanks for the notes on the mixed messaging, we'll tweak the website to be more informative.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: