It's hilarious to think there exists people who think googlebot does not get spe...

quitethelogic · on March 26, 2021

> Googlebot has a range of IP addresses that it publicly announces so websites can whitelist them.

Google says[1] they do not do this:

"Google doesn't post a public list of IP addresses for website owners to allowlist."

[1]https://developers.google.com/search/docs/advanced/crawling/...

johncolanduoni · on March 26, 2021

From that same page they recommend using a reverse DNS lookup (and then a forward DNS lookup on the returned domain) to validate that it is google bot. So the effect is the same for anyone trying to impersonate googlebot (unless they can attack the DNS resolution of the site they’re scraping I guess).

tedunangst · on March 26, 2021

I don't whitelist googlebot, but I don't block them either because their crawler is fairly slow and unobtrusive. Other crawlers seem determined to download the entire site in 60 seconds, and then download it again, and again, until they get banned.

Mauricebranagh · on March 26, 2021

I have never had that problem running screaming frog on big brand sites apart from one or two times.

WesolyKubeczek · on March 26, 2021

I don't scrape a website often, but when I do, I'm using a user agent of a major browser.

dheera · on March 26, 2021

Do any of them intersect with Google Cloud IP addresses? If so set up a VPN server on Google Cloud.