Hacker News new | past | comments | ask | show | jobs | submit login

Just 2 buggy crawlers seems not that many, sure they each had large impact, but given that there are likely hundreds if not thousands of such crawlers out there it's a rather small number. It seems that most crawlers are actually respectful.



I used to run a site with a huge number of pages that had high running costs but low revenue.

The only web crawler that did anything for me was Google, as Google sent an appreciable amount of traffic. Referrers from Bing were almost undetectable: the joke among my black hat SEO friends at the time was that you could rank for money keywords like "buy wow gold" and get 10 hits. Then there were the Chinese crawlers like Baidu that would crawl at 10x the rate of Google but send zero referrers. And then there were crawlers looking for copyrighted images that cost me money to accommodate even if they never sent me cease and desist letters.

As much as I hate the Google monopoly I couldn't afford having my site crawled like that without any benefit to me.

It's an awful situation for the long term though because it prevents new entrants. Right now I am thinking about a new search engine for a vertical where a huge number of products are available from different vendors and when you do find results from Google they are sold out at least 70% of the time. I hate to think it's going to get harder to make something.


How many foundation AI companies are there? 2 is a pretty big chunk of that pie.


I think there are a thousand wannabe companies all trying to suck up as much data as they can; not a sustainable situation in any way.


There was a paper about webcrawlers circa 2000 that pointed out that the vast majority of academics who ran webcrawlers never published a paper based on their work.


Sounds like all the kids who want to make a video-game and start with building a game engine.


What makes you think that only a dozen or so foundation AI companies are scraping?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: