Impressive feat! Does reddit have rate limiting, or other hurdles in place simil...

sam0x17 · on June 6, 2023

As history has shown, you can only do so much to stop this. If you perfectly mimic the GoogleBot and use google IP ranges by hosting on google cloud, they either take an SEO hit or let you bot them at the end of the day. GoogleBot looks like a DDoS attack a lot of the time too

You can also go the route of looking like a pool of users, then it's just a game of cat and mouse and one providers don't really have time to play

thih9 · on June 6, 2023

> As history has shown, you can only do so much to stop this.

History has shown you can stop this well enough. Try accessing e.g. instagram; bibliogram attempted this, the project is now discontinued.

sam0x17 · on June 7, 2023

This is true in the case of content that doesn't care about SEO. Reddit cares very very much about SEO, so it can never truly block bots.

moneywoes · on June 6, 2023

The google scraper ips are very different no?

> If you perfectly mimic the GoogleBot and use google IP ranges by hosting on google cloud

sam0x17 · on June 7, 2023

You are right, apparently they do publish the ranges https://developers.google.com/search/docs/crawling-indexing/...

dom96 · on June 6, 2023

I've only done some rudimentary rate limiting checks and it doesn't seem like they do. Though I haven't pushed it far (~1000 rpm).

In any case, my plan is to deal with it if it becomes a problem.