Hacker News new | past | comments | ask | show | jobs | submit login

If the index key does not start with the partition key, then won't that end up being everything in the majority of cases?

In the examples given (price, license plate) there is no locality between the partition keys (armor id, person) and the index key. A query for all armor priced between 200-400 would end up touching every partition that contains armor priced between 200-400. Unless the set of armor is small you will end up needing to scan every partition.




Currently, the entire keyspace is queried, but querying the entire keyspace does not requiring touching every partition. Only a covering subset which is influenced by your N-value (number of replicas) needs to be queried because the index is replicated alongside your k/v data.

For example, in a 4-partition ring with N=2, keys mapping to p1 are replicated on p1,p2; p2 on p2,p3; p3 on p3,p4; and p4 on p4,p1. As such, you only need to query p1,p3 or p2,p4 to cover the entire keyspace.

In general, approximately RingSize / N partitions need to be queried. The new smart coverage code figures this out as well as deals with routing around failed nodes and other issues.

EDIT: Since the replicas value (N) is settable per bucket in Riak, there's some interesting extreme cases that you could envision here. For example, you could have a bucket where N = RingSize, in which case the index is replicated to every node and you only need to query a single partition to lookup values. Of course, then you lose the ability to perform multiple queries in parallel with a more partitioned/distributed index space (which would be more useful for large results sets). As with database systems in general, the best configuration here depends on data and use case.


I assume queries are done over R=1 consistency then? Is W=N the only way to keep writes consistent with these indexes at all times?


As far as I know, R=1. Rusty is likely the best to comment on this and things may change before/after release, but currently there is no way to specify R for index lookups, and only the minimal set of replicas is queried.

Technically, when you perform a write, Riak will always dispatch to N replicas. W simply requires Riak to confirm W writes before responding to the client. So W=N allows you to know N index sets have been updated, but it's not strictly necessary. At the end of the day, indexes are eventually consistent like the rest of Riak.


Right. I'm just saying that if I write and I want to assume a query after that write will include it, I will need W=N since R=1. Which is fine... but tricky. W=2,R=2,N=3 has been my favorite combination but I guess there are always cases to try other setups.


I get it, very cool.




Consider applying for YC's W25 batch! Applications are open till Nov 12.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: