Hacker News new | past | comments | ask | show | jobs | submit login

I actually don't mind scraping it that much, and even enjoy the adversarial aspects of writing scrapers. but it makes a lot of high level functionality like filtered streams or historical search much less accessible; I'd probably never have learned a lot of network analysis stuff over the last decade or so if I'd had to to pay to access streaming data first. Also, I think it's going to be harder for academic researchers to get institutional approval to scrape adversarially, so it could put a dent in a lot of social science research by forcing people to chase grants instead of focusing on their code.



It's ironic, for a couple (non-Twitter) projects I wound up scraping because either a) they didn't have an API yet (e.g. early crypto pricing sites) or b) I wasn't confident the API would remain intact over the long term. Kind of depressing.


> I think it's going to be harder for academic researchers to get institutional approval to scrape adversarially

This is a good point about what might happen, but it seems worthwhile to address and fix directly. Personally I don't see why adversarial scraping of a publicly published website should require any more ethical consideration/review than using the suggested API would. Ethical concerns should revolve around humans, not the business desires of non-human entities.


Also quality residential proxies are pricey. You need to rate limit and rotate both IP and puppeteer fingerprint when adversarially scraping.




Consider applying for YC's first-ever Fall batch! Applications are open till Aug 27.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: