Hacker News new | past | comments | ask | show | jobs | submit login

Tangentially related question:

Is Python still the most common tool used for web scraping and if so, what's the advantage over jsdom/cheerio or, say a headless browser based tool like puppeteer?

I've been using these tools for years, but I grew up in the JS world, so I'd be curious to hear people with different backgrounds/biases than mine:)




Is Python still the most common tool used for web scraping and if so, what's the advantage over jsdom/cheerio or, say a headless browser based tool like puppeteer?

I'm a bit of a Python zealot, but if you prefer JS then use JS. The best tool is the one you know.

I think Python became the scraping language because many people thought it was significantly easier to use than Perl, and Perl was on top because it was significantly easier than shell scripting. Any language that is incrementally easier, or better in other aspects wont inspire people to learn something new. If we had node 15 years ago, maybe it would have been JS.

As far as headless browsers, selenium has official Python bindings. Though I kind of consider that cheating.

For my personal taste, I choose Python because I can write it off the top of my head without making syntax errors, or leaning on an IDE. It's the only language I can do that with, and I've been able to do it since about a month into using it.


For low-level stuff where you don't need the overhead of puppeteer, I doubt that there's a better solution. I do pretty much anything with https://www.python-httpx.org these days.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: