sorry, no I don't use flickr. I run my own gallery... (gallery2), it's open to c...

catch23 · on July 16, 2009

don't tell me you run your own search and mail too...

jacquesm · on July 16, 2009

mail yes (postfx + horde), search = google.

tybris · on July 16, 2009

Pfff, real men do their own crawling.

jacquesm · on July 16, 2009

well, I toyed around with writing my own search engine (have a pretty good proof of concept) but in the end the bandwidth costs would have been prohibitive... so there :)

Keyframe · on July 16, 2009

http://www.80legs.com/ - jdrock, founder, is around here on HN

jacquesm · on July 16, 2009

That's extremely interesting. Thank you!

The funny thing is while coding that stuff the bigger problems were financial and the enormous amount of cruft that is the web. The actual search engine wasn't that hard at all.

jdrock · on July 16, 2009

Yep - that's exactly it. Setting up the infrastructure to handle large, web-scale content analysis is the real challenge. (Shameless plug alert) That's why we setup 80legs: to help everyone not called Google/Yahoo/Microsoft to have comparable capabilities when it comes to this.

jdrock · on July 16, 2009

Right now we crawl again for each user, but as we scale up we're going to start doing some caching and providing data streams.

jacquesm · on July 16, 2009

I'd hate to see your power & network bill :)

But I think that once you have enough customers the cost of 'crawling' goes down for every new customer you sign up because you only need to crawl a page once and you can sell the crawled result to many customers. Or do I misread your model and is every page crawled over and over again for every user ?

litewulf · on July 16, 2009

(This intrigues me. I had imagined the long tail queries were really hard. I mean, the places where Google succeeds and Bing fails, or vice-versa seem to me the "gaps" where for whatever reason its difficult to get things right, be they for spam reasons or scoring difficulties.

Could you define "good"?)

mahmud · on July 16, 2009

Montezuma is nice. Also see how it "relates" to Lucene:

http://lemonodor.com/archives/001361.html