Hacker News new | past | comments | ask | show | jobs | submit login

Using a headless browser for scraping is a lot slower and resource intensive than parsing HTML.



I don't find this as a concern - in all the scraping I've done, the only bottleneck was the intentional throttling/rate limiting, not the speed and resources spent by the headless browser; a small, cheap machine could easily process many, many times more requests than it would be reasonable to crawl.


Sure, but it might be the only way to get the data.


It might be, but _starting_ a scraping project with a headless browser might be excessively expensive if you don't need the additional features.


"only" is a bit of an overstatement. The data is always coming from somewhere, it just depends on how much effort needed to reverse engineer the JavaScript code path to the data




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: