Hacker News new | past | comments | ask | show | jobs | submit login

Yep - that's exactly it. Setting up the infrastructure to handle large, web-scale content analysis is the real challenge. (Shameless plug alert) That's why we setup 80legs: to help everyone not called Google/Yahoo/Microsoft to have comparable capabilities when it comes to this.



Right now we crawl again for each user, but as we scale up we're going to start doing some caching and providing data streams.


I'd hate to see your power & network bill :)

But I think that once you have enough customers the cost of 'crawling' goes down for every new customer you sign up because you only need to crawl a page once and you can sell the crawled result to many customers. Or do I misread your model and is every page crawled over and over again for every user ?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: