If the index key does not start with the partition key, then won't that end up b...

jtuple · on July 26, 2011

Currently, the entire keyspace is queried, but querying the entire keyspace does not requiring touching every partition. Only a covering subset which is influenced by your N-value (number of replicas) needs to be queried because the index is replicated alongside your k/v data.

For example, in a 4-partition ring with N=2, keys mapping to p1 are replicated on p1,p2; p2 on p2,p3; p3 on p3,p4; and p4 on p4,p1. As such, you only need to query p1,p3 or p2,p4 to cover the entire keyspace.

In general, approximately RingSize / N partitions need to be queried. The new smart coverage code figures this out as well as deals with routing around failed nodes and other issues.

EDIT: Since the replicas value (N) is settable per bucket in Riak, there's some interesting extreme cases that you could envision here. For example, you could have a bucket where N = RingSize, in which case the index is replicated to every node and you only need to query a single partition to lookup values. Of course, then you lose the ability to perform multiple queries in parallel with a more partitioned/distributed index space (which would be more useful for large results sets). As with database systems in general, the best configuration here depends on data and use case.

strmpnk · on July 26, 2011

I assume queries are done over R=1 consistency then? Is W=N the only way to keep writes consistent with these indexes at all times?

jtuple · on July 26, 2011

As far as I know, R=1. Rusty is likely the best to comment on this and things may change before/after release, but currently there is no way to specify R for index lookups, and only the minimal set of replicas is queried.

Technically, when you perform a write, Riak will always dispatch to N replicas. W simply requires Riak to confirm W writes before responding to the client. So W=N allows you to know N index sets have been updated, but it's not strictly necessary. At the end of the day, indexes are eventually consistent like the rest of Riak.

strmpnk · on July 26, 2011

Right. I'm just saying that if I write and I want to assume a query after that write will include it, I will need W=N since R=1. Which is fine... but tricky. W=2,R=2,N=3 has been my favorite combination but I guess there are always cases to try other setups.

arielweisberg · on July 26, 2011

I get it, very cool.