Hacker News new | past | comments | ask | show | jobs | submit login

It's neat using jQuery for this but I've found the arduous part of scraping isn't in the actual parsing and extraction of data your target page, but rather in the post-processing, working around incomplete data on the page, handling errors gracefully, keeping on top of layout/URL/data changes to the target site, not hitting your target site too often, logging into the target site if necessary, respecting robots.txt, keeping users informed of scraping, sane parallelisation of requests, and general problems associated with long-running background process.

All tractable problems with standard solutions, but it's difficult to accept the claim that the idea of using jQuery—which is still pretty neat IMO—now makes scraping easy.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: