We commonly encounter users who believe that searching on
two fields can be facilitated by creating two separate
indexes on those fields. This example should give you some
intuition about why this just isn’t possible.
Not really, as the example only allows a search on ingredient, since it explicitly says the name is not known at all, and hence could never be used for search.
Now let's say we know two things about the unknown recipe: it contains chicken and it's from the Italian kitchen.
I could definitely use both to speed up my search by looking only at pages that occur in both indexes (in this case, 15 and 1032). This can be done in O(# of indexes * index length) time if both indexes are in sorted order.
You don't even need to look at the entire index. See the point where chicken jumps from 15, to 582? We know we can then skip to at least 582 in all the other indexes we're looking at. (Which is doable as the indexes are going to be stored as some sort of tree we can retrieve a particular range out of) This is pretty much a merge join http://en.wikipedia.org/wiki/Sort-merge_join
Incidentally you can make indexes on Ingredient->Page/Cuisine->Page rather than maintaining Page indexes for specific ingredients/cuisines. So with just two indexes you can filter out any Cuisine/Ingredient combo (You can even do multiple ingredients! As it's an intersection though, multiple cuisines wouldn't be useful as few dishes are likely to fall under multiple cuisines, you could still do it though.)
That wasn't "intuitive" to me either. And what about the simpler case of multiple values of a single index, for example Chicken and Cashews. Any good retrieval system should be able to intersect the lists from the indexes. The existence of these compound indexes leads me to believe there is some limitation.
Not really, as the example only allows a search on ingredient, since it explicitly says the name is not known at all, and hence could never be used for search.
Now let's say we know two things about the unknown recipe: it contains chicken and it's from the Italian kitchen.
Given these two indexes:
I could definitely use both to speed up my search by looking only at pages that occur in both indexes (in this case, 15 and 1032). This can be done in O(# of indexes * index length) time if both indexes are in sorted order.