Interesting. We've implemented a metadata layer for HDFS and YARN using NDB (MyS...

ericfrenkiel · on May 20, 2015

- row-level locking -> yes we use MVCC and take a row level write lock when necessary for consistency

- independent transaction coordinators at data nodes -> we have a tier called "aggregators" that act as transaction coordinators. These are the nodes you connect to. Under the hood leaf nodes in memsql also manage transactions.

- pruned index scans -> Do you mean information retrieval? Our indexes support seeks and range scans if that's what you mean.

- network-aware transactions (with user-defined partition keys for tables) --> yes, we have user-defined partition keys (shard keys) and transactions work across multiple nodes on the network.

- any asynchronous/event API --> no, we don't have an event API Most of our use cases are "pull" oriented which scales very well with MemSQL

jamesblonde · on May 21, 2015

Great. Lots of good stuff there. Pruned index scans are index scans where the data is located on a single shard and the index scan doesn't flood all nodes in the DB. I'll definitely be looking into MemSQL.

SkidanovAlex · on May 20, 2015

MemSQL partitions data across nodes by hash, not by range, so partition prunning is less applicable. However, in a case when it can be applied MemSQL does apply it. [1]

Within each node, for column store tables in MemSQL we do use segment elimination very aggressively, which is effectively the same thing as partition pruning. [2] [3]

[1] http://docs.memsql.com/latest/concepts/distributed_sql/#inde...

[2] http://docs.memsql.com/latest/concepts/columnar/#query-effic...

[3] http://docs.memsql.com/latest/concepts/columnar/#maintenance...

jamesblonde · on May 21, 2015

Shard key matching is effectively partitioned pruning - which is great. This is a feature not many people are aware of, but is super important when scaling to large clusters and when you have "session-oriented" (or in our case inode-oriented) data spread across different tables.