Perhaps better phantomjs (http://phantomjs.org), perhaps using casper (http://ca...

baudehlo · on July 22, 2013

Or do what we do at Hubdoc, and use both Node and Phantom. Node for performance where it's possible, and Phantom where the site has been built in such a way that scraping in Node becomes not worth the effort of figuring out all the weird stuff they've done in client side JS.

We maintain a Node to Phantom bridge for this: https://github.com/baudehlo/node-phantom-simple

kanzure · on July 22, 2013

Curious, you use the webserver module in phantomjs, is that right? And that's how you do the inter-process communication? I'm curious how you chose that over websockets, or over HTTP polling from your phantomjs client against a local node server..

What about using something like node-gir, or whatever appjs does to combine the event loops of node/v8 and chromium/v8?

saryant · on July 22, 2013

I wrote a backend system using Phantom and Akka to generate graphs using D3 and rasterize them into PNGs and put them into user-specific emails.

Phantom has some quirks but overall it's pretty solid.