Hacker News new | past | comments | ask | show | jobs | submit login

Perhaps better phantomjs (http://phantomjs.org), perhaps using casper (http://casperjs.org) on top to handle some of the glue code. Phantomjs is a full headless browser (it'll give you screenshots of the pages it downloads if you want them); casper is a library that makes sequencing tasks somewhat easier.

Node, by itself, doesn't have full versions of a lot of the objects that Javascript on the pages would refer to (DOM, event model, etc.); phantomjs gives you all of that.




Or do what we do at Hubdoc, and use both Node and Phantom. Node for performance where it's possible, and Phantom where the site has been built in such a way that scraping in Node becomes not worth the effort of figuring out all the weird stuff they've done in client side JS.

We maintain a Node to Phantom bridge for this: https://github.com/baudehlo/node-phantom-simple


Curious, you use the webserver module in phantomjs, is that right? And that's how you do the inter-process communication? I'm curious how you chose that over websockets, or over HTTP polling from your phantomjs client against a local node server..

What about using something like node-gir, or whatever appjs does to combine the event loops of node/v8 and chromium/v8?


I wrote a backend system using Phantom and Akka to generate graphs using D3 and rasterize them into PNGs and put them into user-specific emails.

Phantom has some quirks but overall it's pretty solid.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: