But how is this so different from the index servers? No map/reduce was required ...

mark_l_watson · on Sept 11, 2010

Good points, thanks. Without more tech detail, it does look like the Big table type datastore is being updated while being used for search. In Lucene/Nutch/etc. apps, I think it s common to build one index offline, then swap for the live index (don't need to do so, but it makes sense to sometimes do this). Google is likely getting rid of the swapping index part of the process by having a large persistent datastore that is continually indexed.

jfager · on Sept 12, 2010

Have you seen Lucandra? It's Cassandra-backed Lucene that gives you soft real-time updates. I'm sure Google's infrastructure is far more advanced, but the idea of real-time indexing for big data has been around for a while, and is available in OSS projects today.

mark_l_watson · on Sept 12, 2010

I looked at Lucandra a year ago, and it had TBDs then - just looked at github, looks like the project has been under active development.