Hacker News new | past | comments | ask | show | jobs | submit login

As a side note, I have had quite a bit of experience trying to block automated scraping services. And I found that the best way is to quietly attempt to detect scraping. Then, serve up tainted data.

In our case, competitors were scraping pricing data in order to competitively price their products without having to do the work.

So we just randomly start to give them incorrect prices on every few products. Not only would it make the whole data set useless, they had no way of figuring out which data was correct without manually checking and since we didn't do it to everything and started at random intervals, it made it too difficult for them to figure out when their ip had actually be quietly blacklisted.




What's ironic is that most of the sites with anti scraping protection also do scraping of their own.

E.g. Amazon and Walmart both do a lot of their own scraping.


Really going to call for a [citation needed] on that "most"!


Maybe I should rephrase to “put the most effort into anti scraping”.

Every major ecommerce site scrapes, it would be a competitive disadvantage if they didn’t.


Something like that could only work if prices are scraped from one source only. If multiple sources are used they could just compare prices and exclude the ones that fall way off. So my guess is your site is an edge case.


What do you do if you can't detect the scraping? And if you do detect scraping, how do you ensure the data you provide them is both invalid and consistent?


I used a multiplier that was calculated using the date, a static secret, and a seed hashed from the sku. So it was consistent but the offset was different product to product. So that even if you manually went in and figured out the offset for a specific product you couldn't just offset all scraped prices.

But once they lose a bunch of money the first time, they tend to stop trying. We tracked down one competitor that was mirroring our prices on an hourly basis. So we waited until late at night, tanked our price on a few expensive items, then placed orders on the competitors site.

The human touch tends to scare off scrapers faster than a technological fence anyway.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: