But how is this so different from the index servers? No map/reduce was required to lookup a query in the index servers, it was only used to /build/ to index servers. I can see that this new method allows for more rapid modifications of the index servers, but I don't see how it is related to Google Instant.
"Google distinguished engineer Ben Gomes told The Reg that Caffeine was not built with Instant in mind, but that it helped deal with the added load placed on the system by the "streaming" search service.
"Lipkovitz makes a point of saying that MapReduce is by no means dead. There are still cases where Caffeine uses batch processing, and MapReduce is still the basis for myriad other Google services. "
I can't wait for the technical papers on Caffeine either, but to me it does not seem related to Google Instant in many ways, it's just part of Google's desire to be able to respond more rapidly to changing content on the internet. Which is totally OK with me. But then people shouldn't claim they dumped Map/Reduce in order to implement Google Instant.
Good points, thanks. Without more tech detail, it does look like the Big table type datastore is being updated while being used for search. In Lucene/Nutch/etc. apps, I think it s common to build one index offline, then swap for the live index (don't need to do so, but it makes sense to sometimes do this). Google is likely getting rid of the swapping index part of the process by having a large persistent datastore that is continually indexed.
Have you seen Lucandra? It's Cassandra-backed Lucene that gives you soft real-time updates. I'm sure Google's infrastructure is far more advanced, but the idea of real-time indexing for big data has been around for a while, and is available in OSS projects today.
Furthermore, if you look up an article TFA links to: http://www.theregister.co.uk/2010/09/09/google_caffeine_expl... , you see quotes like this:
"Google distinguished engineer Ben Gomes told The Reg that Caffeine was not built with Instant in mind, but that it helped deal with the added load placed on the system by the "streaming" search service.
"Lipkovitz makes a point of saying that MapReduce is by no means dead. There are still cases where Caffeine uses batch processing, and MapReduce is still the basis for myriad other Google services. "
I can't wait for the technical papers on Caffeine either, but to me it does not seem related to Google Instant in many ways, it's just part of Google's desire to be able to respond more rapidly to changing content on the internet. Which is totally OK with me. But then people shouldn't claim they dumped Map/Reduce in order to implement Google Instant.