At least with PhantomJS I felt like my system would begin to lockup if there were too many instances rendering at the same time (and it didn't appear to be an issue of too little memory).
Scaling is a challenge definitely. Rendering image-heavy sites requires quite a lot of RAM. But as others have stated, the good news are you can quite easily scale this horizontally by adding more servers. There's no shared state between the server instances behind a load balancer.
There's also room for improvement in how efficiently a single server instance can render PDFs. The API doesn't yet support resource pooling, this would make reusing the same Chrome process (with e.g. 4 tabs) possible. The implementation requires careful consideration since in that model it's possible to accidentally share content from previous requests to the new requesters.
You can actually reuse a running instance and create a new context with this: https://chromedevtools.github.io/devtools-protocol/tot/Targe.... Issue is that most libraries don't have an API for this (not sure why), and that long-running Chrome instances can get to a quirky state + other issues. Certain parameters require you start Chrome with the right flags, so reusing a running Chrome process doesn't always work.
I suspect that "extracting" this part of chrome into a library would result in the same thing: a library that starts up the full chrome environment and prints a page to PDF.
The PDF-ization isn't the part that's hard to extract (there are libraries to create PDFs from scratch already available, and they're small/intelligible). Rather, it's the rendering of a webpage for display that's the hard part, and what most of the code in any web browser is concerned with. Whether that display is a monitor or a PDF doesn't change much.
I've looked and it's not easy. There are different ways you can wrap the internals of chrome but really, it's hard to (and officially not recommended, from memory) pull out just a subset to work with.
It is scalable. It scales linearly (and for practical purposes indefinitely) with the amount of money you spend on AWS Lambda. It might not have a nice constant factor, but it is scalable.
This is just pedantic. Anything can be scalable using that definition. Heck, I could hire a 3rd world worker to manually draw the PDF's, scan them to PDF using a scanner, and put them on a server and it would "scale linearly with the amount of money I spend" on labor.
No, some things can't be, like badly architected monoliths, or databases. Especially with databases it's not a given, which is why for quite some time in the last years things like the "MongoDB is web scale" blew up and people started mindlessly asking "is it scalable" (which as you figured out, means very little for a lot of systems). I'm also pretty sure that your example scales worse than linearly, since you have to introduce multiple levels of management at some point.
"scalable" = "can it be scaled", which is not a given for every system
Redrawing the PDF's and scanning them would likely not scale linearly with the amount of money you spend on labor. Labor tends to have diminishing marginal returns. The cost of hiring two workers is more than double that of hiring one worker because there is additional complexity in coordinating the workers.
Also, for the record, this comment I'm making right now is just pedantic.
If there is only one worker the requests can flow directly to that one worker. If there are two workers, something would need to sit in the middle and determine how to distribute the requests. That extra hop is where the extra complexity comes from.
Web rendering is hard. You'll probably have to scale horizontally at some point, and balance load between servers. But a microservice architecture makes this not terrible.
At least with PhantomJS I felt like my system would begin to lockup if there were too many instances rendering at the same time (and it didn't appear to be an issue of too little memory).
Nonetheless, this looks promising.