Hacker News new | past | comments | ask | show | jobs | submit login

Doesn't work at all with JS.

This is a big thing on many sites now.

Also, since that is the case, you could build this in a few hours using something like: https://github.com/bda-research/node-crawler. Yes, it would have no gui, so you lose that.




For sites that load data using AJAX, we recommend you take a look at our Chrome extension (https://wrapapi.com/#/chromePlugin). Our philosophy isn't to run a full headless browser (similar to Phantom), but rather make it really easy to find the AJAX requests that actually load the data you need.


If JS a problem for you, try Kantu. It works with screenshots and uses OCR for scraping. The beauty is that it works with any kind of site. But clearly, the speed can not match a node.js or perl based scraper (mechanize etc), so it is not suitable for high volumes.


Do you find it better than Phantom?

Just reading about Kantu now. It reminds me of http://www.sikuli.org/


Yeah, the concept is the same as Sikuli, but all inside Chromium (and the OCR is better).

>Do you find it better than Phantom?

It depends. Once you have a working script, web scraping with Phantom is much faster and much more resource efficient. But since Kantu works visually, you do not have to touch any page source code. That makes it much easier/faster to create the automation in the first place, especially for complex sites with date controls, drag & drop and other Javascript.


I've had issues with web scraping with content generated from JS, and I just ran it through PhantomJS, then extracted the rendered HTML.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: