Hacker News new | past | comments | ask | show | jobs | submit login

My impression is that facebook and twitter have really strong anti scraping measures. Is that wrong? And is there any reliable scraping services that can actually do scraping of those large companies' sites at a reasonable cost?



One thing to note about FetchFox: it runs as a Chrome extension. This means it has a different interaction with anti-scraping measures than cloud based tools.

For one thing, many (most? all?) large sites ban Amazon IP's from accessing their websites. This is not a problem for FetchFox.

Also, with FetchFox, you can scrape a logged in session without exposing any sensitive information. Your login tokens/passwords are never exposed to any 3rd party proxy like they would be with cloud scraping. And if you use your own OpenAI API key, the extension developer (me) never sees any of the activity in your scraping. OpenAI does see it, however.

> And is there any reliable scraping services that can actually do scraping of those large companies' sites at a reasonable cost?

FetchFox :).

But besides that, the gold standard for scraping is proxied mobile IP requests. There are services that let you make requests which appear to come from a mobile IP address. These are very hard for big sites to block, because mobile providers aggregate many customer requests together.

The downside is mainly cost. Also, the providers in this space can be semi-sketchy, depending on how they get the proxy bandwidth. Some employ spyware, or embed proxies into mobile games without user knowledge/consent. Beware what you're getting into.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: