Hacker News new | past | comments | ask | show | jobs | submit login

OK, one thing I'm confused about...what's the advantage scraping with node/jquery over a traditional scripting language like Ruby + Nokogiri or Mechanize?

It's true that this process won't render the page-w-ajax as your browser will, but I've found that if you do some web inspection of the page to determine the address and parameters for the backend scripts, then you don't even have to pull HTML at all. You just hit up the scripts and feed them parameters (or use Mechanize, if cookie/state-tracking is involved).




Partly the advantage is that you have CSS selectors to examine your document (I don't know if Ruby/Mechanize does that - I'm just saying what is good about node+jquery), and you have a language that all web developers know about. So it's about minimising friction from doing front end web work to doing scraping work. At my company this gives us a financial advantage - we can hire basic jQuery web developers to work on our scrapers.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: