I wonder how scalable the service is... At least with PhantomJS I felt like my s...

kimmobru · on Oct 5, 2017

Scaling is a challenge definitely. Rendering image-heavy sites requires quite a lot of RAM. But as others have stated, the good news are you can quite easily scale this horizontally by adding more servers. There's no shared state between the server instances behind a load balancer.

There's also room for improvement in how efficiently a single server instance can render PDFs. The API doesn't yet support resource pooling, this would make reusing the same Chrome process (with e.g. 4 tabs) possible. The implementation requires careful consideration since in that model it's possible to accidentally share content from previous requests to the new requesters.

Bromskloss · on Oct 6, 2017

> Rendering image-heavy sites requires quite a lot of RAM.

Is that the limiting factor? How many would you optimally do in parallel if RAM wasn't an issue?

Scarbutt · on Oct 5, 2017

You need to fire a new chrome headless process every time you create a PDF, its not scalable but it works.

I wonder if this part of chrome could be easily extracted as a C++ library.

mrskitch · on Oct 5, 2017

You can actually reuse a running instance and create a new context with this: https://chromedevtools.github.io/devtools-protocol/tot/Targe.... Issue is that most libraries don't have an API for this (not sure why), and that long-running Chrome instances can get to a quirky state + other issues. Certain parameters require you start Chrome with the right flags, so reusing a running Chrome process doesn't always work.

zbentley · on Oct 5, 2017

I suspect that "extracting" this part of chrome into a library would result in the same thing: a library that starts up the full chrome environment and prints a page to PDF.

The PDF-ization isn't the part that's hard to extract (there are libraries to create PDFs from scratch already available, and they're small/intelligible). Rather, it's the rendering of a webpage for display that's the hard part, and what most of the code in any web browser is concerned with. Whether that display is a monitor or a PDF doesn't change much.

aidos · on Oct 5, 2017

I've looked and it's not easy. There are different ways you can wrap the internals of chrome but really, it's hard to (and officially not recommended, from memory) pull out just a subset to work with.

hobofan · on Oct 5, 2017

It's using headless chrome, and there is a serverless version of chrome headless (chromeless), so it should be pretty scalable.

6_62607004 · on Oct 5, 2017

A chrome is a chrome... headless chrome uses almost the same resources as a desktop chrome (same engine and all that)

hobofan · on Oct 5, 2017

Yes, but that doesn't matter in regards to the question "Is it scalable?" which OP asked.

Scarbutt · on Oct 5, 2017

But its no scalable, making it serverless just makes scalable at a huge cost.

hobofan · on Oct 5, 2017

You are contradicting yourself.

It is scalable. It scales linearly (and for practical purposes indefinitely) with the amount of money you spend on AWS Lambda. It might not have a nice constant factor, but it is scalable.

misterbwong · on Oct 5, 2017

This is just pedantic. Anything can be scalable using that definition. Heck, I could hire a 3rd world worker to manually draw the PDF's, scan them to PDF using a scanner, and put them on a server and it would "scale linearly with the amount of money I spend" on labor.

hobofan · on Oct 5, 2017

> Anything can be scalable using that definition.

No, some things can't be, like badly architected monoliths, or databases. Especially with databases it's not a given, which is why for quite some time in the last years things like the "MongoDB is web scale" blew up and people started mindlessly asking "is it scalable" (which as you figured out, means very little for a lot of systems). I'm also pretty sure that your example scales worse than linearly, since you have to introduce multiple levels of management at some point.

"scalable" = "can it be scaled", which is not a given for every system

jklein11 · on Oct 5, 2017

Redrawing the PDF's and scanning them would likely not scale linearly with the amount of money you spend on labor. Labor tends to have diminishing marginal returns. The cost of hiring two workers is more than double that of hiring one worker because there is additional complexity in coordinating the workers.

Also, for the record, this comment I'm making right now is just pedantic.

amigoingtodie · on Oct 5, 2017

As long as the workers do not need to coordinate, it should scale.

Round-robin those folks.

jklein11 · on Oct 5, 2017

If there is only one worker the requests can flow directly to that one worker. If there are two workers, something would need to sit in the middle and determine how to distribute the requests. That extra hop is where the extra complexity comes from.

amigoingtodie · on Oct 6, 2017

Kanban

striking · on Oct 5, 2017

Web rendering is hard. You'll probably have to scale horizontally at some point, and balance load between servers. But a microservice architecture makes this not terrible.