Hacker News new | past | comments | ask | show | jobs | submit login

Well I am not arguing that point because I am not doing unethical scraping anyway. Just trying to explain why most scrapers go to hammering the servers directly.

Additionally, in my local market the owners of e-commerce websites are extremely narrow-minded and have zero tech education so all they will ever hear from you is "I want to steal that guy's data" which is of course not true at all. But try and argue with a 50-year old guy with the mindset of a feudal master who never truly worked in their life but want to control how everybody around them works.

If the survival of my business was at stake, I would just scrape one page every 3 or so seconds as a reasonable compromise. In fact I have done so for my amateur scraping experiments, although there the timeout was even steeper -- 10 seconds per page.




I did a project in the past that involved scraping non-mainstream e-commerce sites and we encountered this mindset. Durr, what? Yer want to take all mah data?? It ended up easier to just write the scrapers than to explain what we were doing to Neanderthals.


> Durr, what? Yer want to take all mah data?? It ended up easier to just write the scrapers than to explain what we were doing to Neanderthals.

As demeaning and offensive many people would find that statement to be, I still found it to be the sad reality most of the time.

Plus my local community is much smaller and I would not want vengeful businessmen who understand NOTHING from what I am trying to achieve, to actively sabotage me. They can easily call my ISP and deny me service, for example.

So I opted for ethical scraping without asking questions. Seems to be the best working compromise.

Thanks for sharing your experience. Let's bathe in the confirmation bias it dips us in. :D


I forgot about the managers! They get to have jobs this way.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: