Hacker News new | past | comments | ask | show | jobs | submit login

Hi Sam,

It might be worth adding a section on distributed anonymous scrapers that use some form of messaging middleware to distribute the URLs to scrape. Regarding the anonymous aspect (independent of job distribution, of course), you could walk them through using https://github.com/aaronsw/pytorctl or even a rotating tor proxy. This is how I scraped all those Instagram locations + metadata we discussed about five years ago. Hope you’re doing well!




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: